Google Cloud Dataflow Shows Competitive Advantage for Large-Scale Data Processing

Print Friendly, PDF & Email

MammothData_logoMammoth Data, a leader in Big Data consulting, today announced the findings of its comprehensive cloud solution benchmark study, which compares Google Cloud Dataflow and Apache Spark. The company, specializing in Hadoop®, Apache Spark and other enterprise-ready architectural solutions for data-driven companies, saw a lack of understanding of current cloud technologies with no available comparison of the performance and implementation characteristics of each offering in a common scenario. As a result, Mammoth Data worked with Google to compare Google Cloud Dataflow with well-known alternatives and provide easily digestible metrics.

Google Cloud Dataflow is a fully managed service for large-scale data processing, providing a unified model for batch and streaming analysis. Google Cloud Dataflow provides on demand resource allocation, full life-cycle resource management and auto-scaling of resources.

Google Cloud Platform data processing and analytics services are aimed at removing the implementation complexity and operational burden found in traditional big data technologies.  Mammoth Data found that Cloud Dataflow outperformed Apache Spark, underscoring our commitment to balance performance, simplicity and scalability for our customers,” said Eric Schmidt, product manager for Google Cloud Dataflow.

In its benchmark, Mammoth Data identified five key advantages of using Google Cloud Dataflow:

  • Greater performance: Google Cloud Dataflow provides dynamic work rebalancing and intelligent auto-scaling, which enables increased performance with zero increased operational complexity.
  • Developer friendly: Google Cloud Dataflow features a developer-friendly API with a unified approach to batch and streaming analysis.
  • Operational simplicity: Google Cloud Dataflow holds distinct advantages with a job-centric and fully managed resource model.
  • Easy integration: Google Cloud Dataflow can easily be integrated with Google Platform and its different services.
  • Open-source: Google Cloud Dataflow’s API was recently promoted to an Apache Software Foundation incubation project called Apache Beam.

When Google asked us to compare Dataflow to other Big Data offerings, we knew this would be an exciting project,” said Andrew C. Oliver, president and founder of Mammoth Data. ”We were impressed by Dataflow’s performance, and think it is a great fit for large-scale ETL or data analysis workloads. With the Dataflow API now part of the Apache Software Foundation as Apache Beam, we expect the technology to become a key component of the Big Data ecosystem.”


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind