Sign up for our newsletter and get the latest big data news and analysis.

Databricks Launches Delta To Combine the Best of Data Lakes, Data Warehouses and Streaming Systems

Databricks, provider of the leading Unified Analytics Platform and founded by the team who created Apache Spark™, announced Databricks Delta, the first unified data management system that provides the scale and cost-efficiency of a data lake, the query performance of a data warehouse, and the low latency of a streaming ingest system. Databricks Delta, a key component of the Databricks Unified Analytics Platform that runs in the Cloud, eliminates the architectural complexity and operational overhead of maintaining three disparate systems: data lakes, data warehouses and streaming systems. With Delta, Enterprise organizations no longer need complex, brittle ETL processes that run across a variety of systems and create high latency just to getdata into a rapidly queryable form.

“Many enterprise organizations are struggling with the limitations of data lakes and data warehouses as well as the complexity of managing both and moving data between them,” said Ali Ghodsi, Cofounder and CEO at Databricks. “Delta combines the transactional, fast querying of data warehouses with the scale ofdata lakes and low-latency streaming systems. Because Delta is a unified data management system that also handles both low-latency streaming data and batch processes, it allows organizations to dramatically simplify their data architectures.”

Databricks Delta delivers the following capabilities to simplify enterprise data management:

  • Manage Continuously Changing Data Reliably: Industry’s first unified data management system simplifies pipelines by allowing Delta tables to be used as a data source and sink. Delta tables provide transactional guarantees for multiple concurrent writers – batch and streaming jobs. Delta natively supports the real-time needs of the business by enabling a streaming data warehouse to return the most recent, consistent view of the writes. Upserts in Delta provide a clean way to change data after it has been written, instead of running the entire job again.
  • Perform Fast Queries Without Manual Tuning: Delta automates performance management removes the need for tedious performance tuning approaches. Self-optimizing data layout ensure data queried together is stored together. Delta automates compaction of small files for efficient reads. Intelligent data skipping and indexing leads to massive speedups by not reading unneeded data. Automated caching leads to subsequent  reads being an order of magnitude faster.
  • Provide cost efficiency and scale of Data Lakes: Delta stores all it’s data in Amazon S3 for cost-efficiency and massive scale. The data in Delta is  stored in a non-proprietary and open file format to ensure data portability and prevent vendor lock-in.
  • Integrate with Unified Analytics Platform: Databricks Delta data can be accessed from any Spark application running on the Databricks  platform through the standard Spark APIs. Delta also integrates into the Databricks Enterprise Security model, including cell-level access  control, auditing, and HIPAA-compliant processing. Data is stored inside each customer’s own cloud storage account for maximum control.

 

Sign up for the free insideBIGDATA newsletter.

 

Leave a Comment

*

Resource Links: