Sign up for our newsletter and get the latest big data news and analysis.

Real-Time Analytics from Your Data Lake Teaching the Elephant to Dance

Sponsored Post

One of the biggest challenges with data lakes in general, and Hadoop in particular, is geting real-time analytics performance out of a technology like Hadoop that was designed to trade off performance for scalability. While technologies like Hive, Presto, Parquet, ORC and others have delivered improvements, none of them provide near real-time, sub-second performance at scale.

Technologies like Apache Druid are used today alongside Hadoop to deliver real-time queries using the data from the data lake. Druid has also helped these same companies implement end-to-end real-time analytics using message buses like Kafka or Kinesis.

This whitepaper from Imply Data Inc. explains why delivering real-time analytics on a data lake is so hard, approaches companies have taken to accelerate their data lakes, and how they leveraged the same technology to create end-to-end real-time analytics architectures.

The 14 page whitepaper includes the following compelling topics:

  • Origins and Limitations of the Data Warehouse
  • Enter the Elephant – Hadoop and Big Data
  • Real-Time Analytics = Fast Ingestion + Fast Query
  • Hadoop, EDWs Are Not For Real-Time Analytics
  • How to Add Real-Time Analytics to Hadoop (2 use cases)
  • Keeping Historical and Real-Time Analytics in Sync
  • How Much Faster is Real-Time Analytics Software
  • How Companies Adopted Real-Time Analytics
  • The Elephant’s First Dance Steps

Apache Druid is an open source distributed data store. Druid’s core design
combines ideas from data warehouses, time series databases, and search systems to create a unified system for real-time analytics for a broad range of use cases. Druid merges key characteristics of each of these three architectures into its ingestion, storage and querying layers.

Download the new white paper courtesy of Imply Data, Inc. to learn more about Apache Druid, the open source distributed data store, and how it can solve many of your critical real-time analytics performance needs.

Leave a Comment

*

Resource Links: