Search Results for: parallel file systems

A Contrast of Paradigms – HPCC Systems & Hadoop

Flavio Villanustre writes about the differences between two powerful open source Big Data platforms: HPCC and Hadoop. HPCC and Hadoop are both open source projects released under an Apache 2.0 license, and are free to use, with both leveraging commodity hardware and local storage interconnected through IP networks, allowing for parallel data processing and/or querying […]

Video: Overview of OrangeFS, Open Source File System for Data-Intensive Workloads

In this video, Clemson’s Dr. Walt Ligon provides an overview of OrangeFS, an open source file system tailor-made for Big Data. OrangeFS is an open source parallel file system developed from PVFS. OrangeFS provides good scalability for large HPC systems and can be used with emerging data intensive workloads. Perhaps more importantly, OrangeFS is very […]

Best Practices – Big Data Acceleration

“This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern HPC clusters. An overview of RDMA-based designs for multiple components of Hadoop (HDFS, MapReduce, RPC and HBase), Spark, and Memcached will be presented. Enhanced designs for these components to exploit in-memory technology and parallel file systems (such as Lustre) will be presented. Benefits of these designs on various cluster configurations using the publicly available RDMA-enabled packages from the OSU HiBD project (http://hibd.cse.ohio-state.edu) will be shown.”

Efficiency: Big Data Meets HPC in Financial Services

Converging High Performance Computing (HPC) and Lustre* parallel file systems with Hadoop’s MapReduce for Big Data analytics can eliminate the need for Hadoop’s infrastructure and speeding up the entire analysis. Convergence is a solution of interest for companies with HPC already in their infrastructure, such as the financial services Industry and other industries adopting high performance data analytics.

insideBIGDATA Latest News – 8/14/2020

In this regular column, we’ll bring you all the latest industry news centered around our main topics of focus: big data, data science, machine learning, AI, and deep learning. Our industry is constantly accelerating with new products and services being announced everyday. Fortunately, we’re in close touch with vendors from this vast ecosystem, so we’re in a unique position to inform you about all that’s new and exciting. Our massive industry database is growing all the time so stay tuned for the latest news items describing technology that may make you and your organization more competitive.

Why You Need a Modern Infrastructure to Accelerate AI and ML Workloads

Recent years have seen a boom in the generation of data from a variety of sources: connected devices, IoT, analytics, healthcare, smartphones, and much more. This data management problem is particularly acute in the areas of Artificial Intelligence (AI) and Machine Learning (ML) workloads. This guest article from WekaIO highlights why focusing on optimizing infrastructure can spur machine learning workloads and AI success.

insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning – Part 4

With AI and DL, storage is cornerstone to handling the deluge of data constantly generated in today’s hyperconnected world. It is a vehicle that captures and shares data to create business value. In this technology guide, insideBIGDATA Guide to Data Platforms for Artificial Intelligence and Deep Learning, we’ll see how current implementations for AI and DL applications can be deployed using new storage architectures and protocols specifically designed to deliver data with high-throughput, low-latency and maximum concurrency.

Big Data Meets HPC – Exploiting HPC Technologies for Accelerating Big Data Processing

DK Panda from Ohio State University gave this talk at the Stanford HPC Conference. “This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern HPC clusters. An overview of RDMA-based designs for Hadoop (HDFS, MapReduce, RPC and HBase), Spark, Memcached, Swift, and Kafka using native RDMA support for InfiniBand and RoCE will be presented.”

DDN

For more than 15 years, DDN has designed, developed, deployed, and optimized systems, software, and solutions that enable enterprises, service providers, universities, and government agencies to generate more value and accelerate time to insight from their data and information, on premise and in the cloud. WHY DO DATA-INTENSIVE ENVIRONMENTS PREFER DDN? DDN’s sustained vision and […]

Enabling Value for Converged Commercial HPC and Big Data Infrastructures through Lustre*

A number of industries rely on high-performance computing (HPC) clusters to process massive amounts of data. As these same organizations explore the value of Big Data analytics based on Hadoop, they are realizing the value of converging Hadoop and HPC onto the same cluster rather than scaling out an entirely new Hadoop infrastructure.