NetApp Open Solution for Hadoop

White Papers > Big Data > NetApp Open Solution for Hadoop

The NetApp Open Solution for Hadoop is built on the E2660 storage array, which provides the shared-nothing storage required by HDFS. HDFS is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them across compute nodes throughout a cluster to enable the efficient completion of MapReduce jobs and to provide data resiliency when disk drives fail. This solution leverages the enterprise RAID features of NetApp E-Series external direct-attached storage (DAS) arrays and provides highly available, efficient data storage for Hadoop DataNode data. This reduces extensive data block replication, bringing better storage efficiency and performance to the Hadoop system.
NetApp also leverages a FAS2240HA unified storage system to provide robust NameNode metadata protection and to support network booting of the servers in the Hadoop cluster, which eliminates the need for any internal disks in the JobTracker node and DataNodes.
The NetApp Open Solution for Hadoop provides storage efficiency, manageability, reliability, scalability, and a feature-rich set of tools to meet the demands of rapidly developing big data technology.

The business problem addressed by big analytics, big bandwidth, and big content technologies is the collection of large amounts of raw data from point-of-sale data, to credit card transactions, to log files, to security data. This big data is potentially useful, but might be too large for processing by humans or traditional relational database (RDB) tools. Hadoop and associated MapReduce technologies turn raw data into valuable information, and NetApp® big analytics, big bandwidth, and big content platforms offer the right storage platforms and solutions to ingest the data, analyze it, and then manage valuable datasets.
The NetApp Open Solution for Hadoop based on E-Series storage delivers big analytics in a fiscally responsible way:

  • With preengineered, compatible, and supported solutions based on high-quality storage platforms
  • By avoiding the cost, schedule, and risk of do-it-yourself systems integration and relieving the skills gap
  •  By avoiding substantial ongoing operational costs

The Apache Hadoop project open-source software (Hadoop) addresses the problems associated with big data in two ways.
1. First, it provides the highly scalable Hadoop Distributed File System (HDFS) for storing, managing, and securing very large datasets.
2. The Hadoop MapReduce framework provides a powerful programming model capable of harnessing the computing power of multiple commodity servers into a single high-performance compute cluster. When MapReduce is used, large datasets can be analyzed in a small fraction of the time that might otherwise have been required using the more traditional relational database management system (RDBMS) method.


    Contact Info

    Work Email*
    First Name*
    Last Name*
    Zip/Postal Code*

    Company Info

    Company Size*
    Job Role*

    All information that you supply is protected by our privacy policy. By submitting your information you agree to our Terms of Use.
    * All fields required.