Deploying a Big Data Solution Using IBM GPFS-FPO

Print Friendly, PDF & Email

big-data-picThe world of big data is expanding rapidly. Every day, the world creates 2.5 quintillion bytes of data. In fact, 90 percent of the data in the world today has been created in the last two years alone. While there is much talk about big data, it is not mere hype. Businesses are realizing tangible results from investments in big data analytics, and IBM’s big data platform is helping enterprises across all industries. IBM is unique in having developed an enterprise-class big data platform that allows you to address the full spectrum of big data business challenges.

IBM® General Parallel File System (GPFS™) offers an enterprise-class alternative to Hadoop Distributed File System (HDFS) for building big data platforms. GPFS is a POSIX-compliant, high-performing and proven technology that is found in thousands of mission-critical commercial installations worldwide. GPFS provides a range of enterprise-class data management features.

GPFS can be deployed independently or with IBM’s big data platform, consisting of IBM InfoSphere® BigInsights™ and IBM Platform™
Symphony. IBM has written a technical white paper for deploying GPFS in such environments to help ensure optimal performance and reliability – “Deploying a Big Data Solution Using IBM GPFS-FPO.”

Download this whitepaper today to learn best practices for deploying GPFS-FPO as a file system platform for big data analytics. The goal of this paper is to guide the administrator through various decision points to ensure optimal configuration based on the Hadoop application components being deployed. The following topics are considered in the paper:

  • Understanding big data environments
  • GPFS-FPO key concepts and terminology
  • GPFS enterprise functions overview
  • Understanding data flow and workload
  • Setting up a GPFS-FPO cluster
  • Preparing the GPFS file system layout
  • Modifying the Hadoop configuration to use GPFS
  • Ingesting data into GPFS clusters
  • Exporting data out of GPFS clusters
  • Monitoring and administering GPFS clusters
  • GPFS-FPO restrictions

IBM GPFS is a proven, enterprise-class file system for your big data applications. The advantages of using GPFS include important enterprise-class functionality such as:

  • Access control security
  • Proven scalability and performance
  • Built-in file system monitoring
  • Pre-integrated backup and recovery support
  • Pre-integrated information lifecycle management
  • File system quotas to restrict abuse, as well as immutability and AppendOnly features to protect the team from accidentally destroying critical data

Through these features, GPFS helps improve the time-to-value of the big data investment, allowing the enterprise to focus on resolving business problems.

Download this white paper from the insideBIGDATA White Paper Library.

 

Speak Your Mind

*