“The Hadoop MapReduce framework grew out of an effort to make it easy to express and parallelize simple computations that were routinely performed at Google. It wasn’t long before libraries, like Apache Mahout, were developed to enable matrix factorization, clustering, regression, and other more complex analyses on Hadoop. Now, many of these libraries and their workloads are migrating to Apache Spark because it supports a wider class of applications than MapReduce and is more appropriate for iterative algorithms, interactive processing, and streaming applications.”
“When organizations operate both Lustre and Apache Hadoop within a shared HPC infrastructure, there is a compelling use case for using Lustre as the file system for Hadoop analytics, as well as HPC storage. Intel Enterprise Edition for Lustre includes an Intel-developed adapter which allows users to run MapReduce applications directly on Lustre. This optimizes the performance of MapReduce operations while delivering faster, more scalable, and easier to manage storage.”
For a long time, the industry’s biggest technical challenge was squeezing as many compute cycles as possible out of silicon chips so they could get on with solving the really important, and often gigantic problems in science and engineering faster than was ever thought possible. Now, by clustering computers to work together on problems, scientists are free to consider even larger and more complex real-world problems to compute, and data to analyze.
As compute speed advanced towards its theoretical maximum, the HPC community quickly discovered that the speed of storage devices and the underlying the Network File System (NFS) developed decades ago had not kept pace. As CPUs got faster, storage became the main bottleneck in high data-volume environments.
Intel’s White Paper, “Architecting a High-Performance Storage System,” shows you the step-by-step process in the design of a Lustre file system. It is available for download at insideBIGDATA White Paper Library. “Although a good system is well-balanced, designing it is not straight forward. Fitting components together and making the adjustments needed for peak performance is challenging. The process begins with a requirement analysis followed by a design structure (a common structure was selected for the paper) and component choices.”