Converging High Performance Computing (HPC) and Lustre* parallel file systems with Hadoop’s MapReduce for Big Data analytics can eliminate the need for Hadoop’s infrastructure and speeding up the entire analysis. Convergence is a solution of interest for companies with HPC already in their infrastructure, such as the financial services Industry and other industries adopting high performance data analytics.
Today SGI announced that global deployments of the SGI UV 300H single-node system provide in total over 200 Terabytes of in-memory computing capacity to organizations running the SAP HANA platform. Introduced just one year ago, more than 50 SGI UV 300H systems have been installed in organizations to run a variety of applications on SAP HANA, including the SAP ERP, SAP Supply Chain Management (SCM), SAP Bank Analyzer, and SAP Business Warehouse applications, as well as advanced analytics.
A number of industries rely on high-performance computing (HPC) clusters to process massive amounts of data. As these same organizations explore the value of Big Data analytics based on Hadoop, they are realizing the value of converging Hadoop and HPC onto the same cluster rather than scaling out an entirely new Hadoop infrastructure.
Jim McHugh from Cisco describes how the new Intel Xeon processor E7 v3 processor family will bring to Cisco UCS systems in the big data and analytics arena. He emphasizes how new insights driven by big-data can help businesses become intelligence-driven to create a perpetual and renewable competitive edge within their field.
“The Hadoop MapReduce framework grew out of an effort to make it easy to express and parallelize simple computations that were routinely performed at Google. It wasn’t long before libraries, like Apache Mahout, were developed to enable matrix factorization, clustering, regression, and other more complex analyses on Hadoop. Now, many of these libraries and their workloads are migrating to Apache Spark because it supports a wider class of applications than MapReduce and is more appropriate for iterative algorithms, interactive processing, and streaming applications.”
“When organizations operate both Lustre and Apache Hadoop within a shared HPC infrastructure, there is a compelling use case for using Lustre as the file system for Hadoop analytics, as well as HPC storage. Intel Enterprise Edition for Lustre includes an Intel-developed adapter which allows users to run MapReduce applications directly on Lustre. This optimizes the performance of MapReduce operations while delivering faster, more scalable, and easier to manage storage.”
For a long time, the industry’s biggest technical challenge was squeezing as many compute cycles as possible out of silicon chips so they could get on with solving the really important, and often gigantic problems in science and engineering faster than was ever thought possible. Now, by clustering computers to work together on problems, scientists are free to consider even larger and more complex real-world problems to compute, and data to analyze.
As compute speed advanced towards its theoretical maximum, the HPC community quickly discovered that the speed of storage devices and the underlying the Network File System (NFS) developed decades ago had not kept pace. As CPUs got faster, storage became the main bottleneck in high data-volume environments.