Sign up for our newsletter and get the latest big data news and analysis.

BlueData Announces Bare-Metal Performance for Hadoop on Docker Containers

BlueData®, provider of the leading Big-Data-as-a-Service (BDaaS) software platform, announced breakthrough performance results. The results from a new Intel® benchmarking study show comparable performance for Hadoop when running in a bare-metal environment or in a containerized environment using the BlueData EPIC™ software platform. This study proves that it is possible to deliver the benefits of containerization for Big Data workloads without paying a penalty in performance. This ground-breaking benchmarking milestone was the result of ongoing collaboration between the Intel and BlueData software engineering teams.

Intel Xeon® architecture provides a high-performance, secure, and robust foundation for Big Data analytics. Leveraging the power of Docker containers, the BlueData EPIC software platform makes it easier, faster, and more cost-effective to deploy Big Data infrastructure and applications—including Hadoop, Spark, Kafka, Cassandra, and more— whether on-premises or in the public cloud.

Working closely with BlueData, Intel ran systematic performance comparisons using the TPC Express BigBench (TPCx-BB) benchmark for two identical on-premises test environments on Intel Xeon architecture, one on bare-metal and one using BlueData EPIC with Docker containers:

  • Apples-to-apples comparison: The Intel team evaluated and benchmarked identical configurations for a bare-metal environment versus a containerized environment using BlueData EPIC. Both test environments used the same hardware and were configured using the same Hadoop software – benchmarked at 10, 20, and 50 Hadoop compute nodes with 10 terabytes of data in HDFS. The Big Data workloads in both test environments were deployed on the Intel Xeon processor E5-2699 v3 product family, which helped reduce network latency, improve infrastructure security, and minimize power inefficiencies. Both test environments also used Intel Solid-State Drives to optimize the execution environment at the system level.
  • Ground-breaking performance results: The results from this in-depth benchmarking study show performance for containerized Hadoop on BlueData EPIC to be comparable to the performance for bare-metal Hadoop. In fact, in some cases, the containerized environment achieved superior performance to the bare-metal environment. For example, the BlueData EPIC platform demonstrated an average 2.33% performance gain versus bare-metal across three test runs for 50 Hadoop compute nodes and 10 terabytes of data in HDFS. This performance boost is due to BlueData EPIC’s proprietary IOBoost™ technology, which enhances input/output (I/O) performance using asynchronous storage I/O and data caching.
  • Industry-standard benchmark with real-world use cases: TPCx-BB is an industry-standard Express Benchmark to measure the performance of Big Data analytics frameworks in the Hadoop ecosystem, including MapReduce, Hive, and Spark MLlib. This benchmark provides a realistic measurement and comparison of performance by implementing 30 queries that simulate Big Data processing, analytics, and reporting in real-world use cases. The TPCx-BB data model includes structured data, semi-structured data, and unstructured data; it covers a range of essential functional and business aspects for Big Data use cases.
  • No modifications to the Hadoop software: BlueData EPIC allows enterprises to deploy Big Data frameworks and distributions unmodified, running in Docker containers. The use of Docker is completely transparent, but BlueData customers benefit from the agility, flexibility, and efficiency advantages of containers. For this benchmarking study, both the bare-metal and containerized test environments used Cloudera CDH as the Hadoop distribution. However, because BlueData runs Hadoop distributions and other Big Data frameworks completely unmodified, these performance results also apply to other Hadoop distributions — such as Hortonworks HDP and MapR CDP — as well as other Big Data frameworks such as Spark standalone.
  • Collaboration with Intel: In August 2015, Intel and BlueData embarked on a strategic technology and business collaboration agreement. One of the goals was to ensure optimized performance for BlueData EPIC running on Intel Xeon processor technology. The outstanding results for BlueData EPIC in this benchmarking study were due in part to the ongoing engineering collaboration between Intel and BlueData to investigate, benchmark, test, and continuously improve the software platform. Working together, BlueData and Intel have shown that there is no performance loss when running Hadoop in a containerized environment (using the BlueData EPIC software platform) as opposed to the identical set-up on bare-metal.

BlueData delivers greater simplicity, agility, and cost-efficiency for Big Data deployments,” said Michael Greene, vice president of the Software and Services Group and general manager of System Technologies and Optimization at Intel Corporation “Now, working together, we’ve demonstrated that you can achieve these benefits while also ensuring performance that’s comparable to bare-metal Big Data implementations. It’s a game-changer.”

Intel’s collaboration, performance testing, and feedback helped BlueData make ongoing software enhancements to ensure high-performance Big Data deployments.  The collaboration resulted in an unprecedented performance milestone for Big Data workloads running in Docker containers. With this breakthrough, BlueData and Intel can enable enterprises to take advantage of containerization to simplify and accelerate their on-premises Big Data implementations – while ensuring the best possible performance. And with BlueData, these customers can run Big Data analytics using the same Docker-based application images for both on-premises and public cloud deployments – leveraging the inherent infrastructure portability of containers.

Intel has been great in helping us to optimize and enhance the BlueData EPIC software platform, putting it through its paces to get the best possible performance,” said Kumar Sreekanti, co-founder and CEO at BlueData.  “Together, we’ve shown that you can achieve the same performance – or even better – for Big Data workloads running on our container-based platform. The results are a testament to the collaboration between our teams.”

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: