Sign up for our newsletter and get the latest big data news and analysis.

Enabling Value for Converged Commercial HPC and Big Data Infrastructures through Lustre*

Sponsored Post

Parallel File Systems for Today’s Enterprise

A number of industries rely on high-performance computing (HPC) clusters to process massive amounts of data. As these same organizations explore the value of Big Data analytics based on Hadoop, they are realizing the value of converging Hadoop and HPC onto the same cluster rather than scaling out an entirely new Hadoop infrastructure.

Very few storage architectures can keep up with the I/O demands from these big clusters—for either HPC or Hadoop jobs. Traditionally, Hadoop uses local storage, but for converged HPC and Big Data infrastructures, a parallel file system (PFS) is the only solution that can achieve the hundreds of gigabytes to several terabytes/second HPC requires today.

Leading PFSs Today

The open source Lustre* software and IBM’s Spectrum Scale* (formerly called General Parallel File System*, or GPFS*) stand out as the two leading PFS solutions. However, they present completely different purchasing models. Lustre, being a community-developed and open source licensed model, can create significant value for high-performance data solutions.

Spectrum Scale (or GPFS) is a closed-source, proprietary file system offering, developed within IBM; it is distributed under a fee-based license. Lustre is an open source solution distributed freely under the GNU license agreement. OpenSFS* manages Lustre releases, and Lustre has a large community of developers responding to users requests and contributing to the code. Intel Corporation submits the majority of code enhancements to Lustre, while also adding value to the open release with Intel-branded Lustre versions that include support and maintenance releases. Intel’s versions make Lustre more desirable for enterprise and cloud applications and deployments. Two key enhancements from Intel—Hadoop Adaptor for Lustre and HPC Adaptor for MapReduce—allow Lustre to easily integrate into a converged HPC/Hadoop infrastructure.

Partnerships Deliver Success

High-performance storage solutions that are compatible with a converged infrastructure and capable of achieving the performance demanded by HPC and Big Data analytics are complex and large systems. Organizations usually seek out a technical partner, with expertise in the software, hardware, and overall requirements for scalability, optimization, configuration, networking, backup, and disaster recovery, to meet their needs. One of the first critical decisions companies face in finding their converged infrastructure data solution will be whether to use an enterprise-driven, closed-source, fee-based licensing offering or take advantage of the open source model.

While many organizations trust enterprises with their proprietary software, in today’s marketplace, dollars are short and competition is fierce. So, smaller, competent technology integrators have emerged with Lustre and high-performance data solution expertise. These organizations also deliver a level of personal attention and instant response to customers that rivals their much larger competitors. Most of these integrators are part of Intel Corporation’s Lustre reseller program, which numbers more than 170 companies worldwide.

Open Source Creates Value

Businesses large and small have been cutting costs by deploying open source-based solutions for decades. Linux is the software that dominates the open source model. Red Hat is well known for building a business around it, by providing enterprise-grade services along with its own enhanced offerings, while contributing to the development tree.

As an open source product, Lustre also eliminates vendor lock-in. Purchasers can select from a wide variety of manufacturers and technologies, giving them greater hardware choice and flexibility. If companies choose to use a technology partner for their high-performance data solution, they can RFP across many vendors to obtain the optimum hardware at the best value, and use Lustre as the software.

“Intel is the clear leader in the Lustre Open Source project,” comments Brent Gorda, Intel’s General Manager of their High Performance Data Division (HPDD). “We are the main contributors to the Lustre code and have the largest pool of Lustre experts in the community. Building on the open source version, we offer enterprise-class and cloud versions of Lustre with Intel software enhancements and enterprise-grade services to customers. In this way we are adding value to open source Lustre and giving companies the ability to take advantage of the fastest parallel file system on the planet.”

Intel’s contributions are forward-looking when it comes to Big Data analytics. They integrated adaptors into Lustre, which simplify converging Hadoop onto HPC systems and increase performance. These adapters connect Hadoop to Lustre (Hadoop Adaptor for Lustre—HAL) instead of Hadoop’s HDFS file system, and plug MapReduce into an HPC scheduler (HPC Adaptor for MapReduce—HAM) to easily run Big Data jobs on HPC clusters with Lustre.

“These additions are significant,” adds Gorda. “By plugging Hadoop into Lustre, we completely eliminate one of Hadoop’s costly data copying tasks, the Hadoop shuffle, shortening time to solution.”

Game Changing Offerings

As the core to today’s open source, high-performance, scalable data storage solutions, Lustre offerings–with Intel enhancements, Hadoop and HPC adaptors, and support—present a valuable alternative to proprietary software for converged HPC and Big Data clusters.

Intel’s Lustre offerings include Intel® Enterprise Edition for Lustre software, Intel® Cloud Edition for Lustre software, and Intel® Foundation Edition for Lustre software. Each is designed to meet specific needs in the marketplace. The Enterprise Edition makes Lustre easy to deploy and manage, significantly expands the number of metadata servers, increases reliability, includes hierarchical storage management, and allows it to interface with Hadoop workloads. Support is available directly and through Intel resellers. The Cloud Edition, available through the Amazon Web Services Marketplace, allows customers to stand up a scalable parallel file system in minutes for any number of applications a customer wants to run on Amazon’s Elastic Compute Cloud (EC2). This open source approach from the Lustre community and from companies that are building operations around the software, add considerable value to customers, making Lustre a game changer for high-performance storage solutions.

Ken Strandberg is a professional technology writer based in Portland, Oregon

 

Leave a Comment

*

Resource Links: