Sign up for our newsletter and get the latest big data news and analysis.

Splice Machine’s New OLAP Engine Adds Columnar Storage and In-Memory Caching to its Hybrid Relational Data Platform

SpliceMachineSplice Machine, provider of the open-source SQL RDBMS powered by Apache Hadoop® and Apache Spark™, announced the release of version 2.5 of its industry-leading data platform for intelligent applications. The new version strengthens its ability to concurrently run enterprise-scale transactional and analytical workloads, frequently referred to as HTAP (Hybrid Transactional and Analytical Processing).

The new capabilities further emphasize the benefits of Splice Machine’s hybrid architecture,” said Monte Zweben, co-founder and CEO of Splice Machine. “For modern applications that need to combine fast data ingestion, web-scale transactional and analytical workloads, and continuous machine learning, one storage model does not fit all. The Splice Machine SQL RDBMS tightly integrates multiple compute engines, with in-memory and persistent storage in both row-based and columnar formats. The cost-based optimizer uses new advanced statistics to find the optimal execution strategy across all these resources for OLTP and OLAP workloads.”

With Splice Machine’s hybrid architecture, companies can:

  • Simplify Operational Complexity – Users can avoid managing separate systems, tuning them individually for performance, and writing low-level code and batch programs to keep them in sync.
  • Eliminate Need for Special Coding Skills – Developers can use a single industry-standard SQL and JDBC/ODBC interface to work with the system.
  • Power Concurrent Applications – The ACID transaction implementation is designed for both analytical and operational workloads. This means that it supports high concurrency with even thousands of users or devices updating the system at the same time. Its MVCC, using snapshot isolation, can handle fine-grained updates without locking reads.
  • Support Machine Learning – Modern applications adapt over time by continuously transforming operational data into aggregated features that train statistical machine learning models and deploy those models in real-time decision systems. Splice Machine enables the feature engineering, model selection, and deployment process to take place on one platform without significant data movement.

Version 2.5 of Splice Machine introduces important new capabilities:

Columnar External Tables enables hybrid columnar and row-based querying. Columnar external tables can be created in Apache Parquet, Apache ORC or text formats. Columnar Storage improves large table scans, large joins, aggregations or groupings while the native row-based storage is used for write-optimized ingestion, single-record lookups/updates and short scans.

In-Memory Caching via Pinning gives the ability to move tables and columnar data files into memory for lightning-fast data access. It avoids multiple table scans or writes to high-latency file systems such as Amazon S3. The capability allows data to be stored on very inexpensive storage while being very performant in-memory when required in applications.

Statistics via Sketching helps solve the age-old problem that cost-based optimizers are only as good as their statistics, but most statistics are poor because statistics computation is expensive. Splice Machine utilizes the sketching library created by Yahoo! to provide very fast approximate analysis of Big Data statistics with bounded errors. Now with the power of sketches and histograms, the Splice Machine cost-based optimizer can choose indexes, join orders, and join algorithms with much more accuracy.

Cost-Optimized Storage for AWS users. Data can be stored locally in ephemeral storage, on EBS, S3 and EFS. Depending on the workload and longevity of data, different data can be stored in different storage systems with different price/performance characteristics.

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: