New Benchmark Sets Standards for Determining How Enterprises can Best Tackle BI Workloads

Print Friendly, PDF & Email

atscale_logoAtScale, the company providing business users with speed, security and simplicity for BI on Hadoop, released the results of a comprehensive Business Intelligence benchmark for SQL-on-Hadoop engines. The benchmark tested the industry’s top SQL-on-Hadoop engines over key Business Intelligence (BI) use case queries.  The benchmark reveals and rates strengths and weaknesses of the engines, and reveals which ones are ideally suited to various scenarios.

AtScale’s experience with large enterprise customers helped guide the framework and methodology used for the industry’s first comprehensive BI-on-Hadoop Benchmark.  “We used real-world enterprise experience to produce a document that every technical evaluator can use as part of their evaluation process”, says Josh Klahr, VP of Product Management at AtScale.

Some surprising findings that surfaced include:

  • While Hive is generally a default for SQL on Hadoop, in ALL scenarios it does not provide the fastest performance on its own.
  • While Cloudera Impala is known as a strong player when it comes to SQL-on-Hadoop, the benchmark study found “winners” varied depending on the type of query, size of data and other factors. Each engine has its own “sweet spot” and the study reveals which engine is best for different scenarios.
  • The upgrades to Spark announced recently made a big difference in performance on smaller data sets. We were surprised to find significant performance improvements between Spark 1.5 and 1.6

This benchmark will provide a useful data point for those assessing business intelligence workloads on Hadoop,” said Tom Pringle, Head of Applications Research at Ovum. “We’ve seen an increase in adoption of Hadoop, and most often the focus has been on storage and scale-out capabilities of the new platform.  As more organizations consider analytical workloads on Hadoop, it will be important that they assess the capabilities of SQL-on-Hadoop solutions.”

BI on Hadoop: a key workload

As indicated in the latest Hadoop Maturity Survey, Business Intelligence is now a top workload for Hadoop, ahead of Data Science and ETL.  The maturation of a number of technologies has enabled Business Intelligence to be deployed broadly, creating a unique opportunity for business users in the enterprise to finally be able to adopt Hadoop.

Until now, the industry has provided little guidance on the performance of Business Intelligence workloads on Hadoop.  This has left technology evaluators with a void in measuring each engine against their own needs and workloads.  The AtScale Benchmark Study is aimed at helping evaluators understand the differences across the leading SQL-on-Hadoop engines.

Key Findings:

  • Hadoop is prime for Business Intelligence (BI): All tested engines have passed our tests and are stable enough to support Business Intelligence workloads.
  • One engine does not fit all: Depending on their needs (for example, small vs. large data sets, small vs. large amount of concurrent users), enterprises will find that one engine does not  accomplish everything.  Each engine has its own ‘sweet spot’ and enterprises will find that a blended usage of all engines might fit their company’s goals best.
  • Small vs. Big Data: Engines like Spark SQL or Impala perform best on smaller data sets – i.e. tables with thousands or several million rows of data.
  • Few vs. Many Users: Impala has shown to be the best concurrency test results, over Hive and Spark-SQL. Companies that anticipate connecting large numbers of business users to Hadoop should look into Impala.
  • Constant Innovation: Open source innovation, as seen by Spark SQL’s improvements provides constant innovation.  We expect the industry to continue innovating in this space: Cloudera, which has been working on Impala for the last 5 years, proposed to donate the project to the Apache Software Foundation this past November. There is no doubt more innovation will come out from this new development.

 

Sign up for the free insideBIGDATA newsletter.

Speak Your Mind

*