Deploying a Big Data Solution Using IBM Spectrum Scale

Print Friendly, PDF & Email

big-data-picEvery day, the world creates 2.5 quintillion bytes of data. In fact, 90 percent of the data in the world today has been created in the last two years alone. While there is much talk about big data, it is not mere hype. Businesses are realizing tangible results from investments in big data analytics, and IBM’s big data platform is helping enterprises across all industries. IBM is unique in having developed an enterprise-class big data platform that allows you to address the full spectrum of big data business challenges.

IBM® Spectrum Scale™, formerly IBM General Parallel File System (IBM GPFS™), offers an enterprise-class alternative to Hadoop Distributed File System (HDFS) for building big data platforms. Part of the IBM Spectrum Storage™ family, Spectrum Scale is a POSIX-compliant, high-performing and proven technology that is found in thousands of mission-critical commercial installations worldwide. Spectrum Scale provides a range of  enterprise-class data management features.

IBM Spectrum Scale can be deployed independently or with IBM’s big data platform, consisting of IBM BigInsights™ for Apache Hadoop and IBM Platform™ Symphony. This document describes best practices for
deploying Spectrum Scale in such environments to help ensure optimal performance and reliability.

IBM Spectrum Scale is a proven, enterprise-class file system for your big data applications. The advantages of using Spectrum Scale include important enterprise-class functionality such as access control security; proven scalability and performance; built-in file system monitoring; pre-integrated backup and recovery support; pre-integrated information lifecycle management; and file system quotas to restrict abuse, as well as immutability and AppendOnly features to protect the team from accidentally destroying critical data. Through these features, Spectrum Scale helps improve the time-to-value of the big data investment, allowing the enterprise to focus on resolving business  problems. A new research paper is now available that details the use of IBM Spectrum Scale – “Deploying a Big Data Solution Using IBM Spectrum Scale.”

Download this whitepaper today to learn how IBM Spectrum Scale can play an important role in your big data environment . The following topics are considered in the paper:

  • Introduction and Assumptions
  • Understanding big data environments
  • FPO key concepts and terminology
  • IBM Spectrum Scale enterprise functions overview
  • Understanding data flow and workload
  • Setting up an FPO-enabled IBM Spectrum Scale cluster
  • Preparing the IBM Spectrum Scale file system layout
  • Modifying the Hadoop configuration to use IBM Spectrum Scale
  • Ingesting data into IBM Spectrum Scale clusters
  • Exporting data out of IBM Spectrum Scale clusters
  • Monitoring and administering IBM Spectrum Scale clusters
  • FPO restrictions

The best practices presented in this paper show how to deploy a Spectrum Scale cluster using FPO as a file system platform for big data analytics. The paper covers a variety of Hadoop deployment architectures, including IBM BigInsights, Platform Symphony, direct-from-Hadoop open source (also known as Do-It-Yourself) or with a Hadoop distribution from another vendor to work with Spectrum Scale. Architecture specific notes are included where applicable.

Download this white paper from the insideBIGDATA White Paper Library.


Speak Your Mind