MemSQL Enhances Real-Time Data Pipelines for Spark and Python in New Release

Print Friendly, PDF & Email

memsql_new_06012015MemSQL, a leader in real-time databases for transactions and analytics, today announced significant advances for creating real-time data pipelines for Apache Spark, as well as support for the Python language and Non-Uniform Memory Access (NUMA) architectures in the latest version of MemSQL Ops. MemSQL can now run Spark SQL queries inside of the MemSQL database, provide in-browser Python programming, and automatically optimize NUMA deployments. These features drive rapid results and faster analytics for data scientists.

The newest release of MemSQL Ops reinforces our commitment to the Spark community to deliver even faster access to real-time data and analytics,” said Eric Frenkiel, co-founder and CEO, MemSQL. “Our mission is to deliver technology that integrates advances across the open source ecosystem and that appeals to the programming community at large.”

As a transient processing framework, Spark is well suited for data analysis and model development, but it is not purpose built for high performance SQL. To that end, MemSQL now allows Spark SQL queries to run inside of the MemSQL database, which can improve performance by up to 50x on many workloads. By combining MemSQL with Spark, data scientists can tap a permanent, transactional datastore to feed the latest business data into their models for real-time analytics.


Moreover, the combination of Spark and MemSQL further unifies in-memory processing with in-memory storage for lightning fast results. Users have access to a familiar SQL interface, which provides the performance and persistence to run real-time data pipelines successfully. Spark data transformation capabilities can be fully utilized when paired with distributed, in-memory stores like MemSQL, compared to traditional disk-based stores like HDFS.

The latest release of MemSQL Ops also features in-browser Python programming, which opens up Python’s vast library of analysis packages such as Numpy, Scipy and Pandas to users running MemSQL. These libraries, as well as the prototyping speed of Python, have made Python incredibly popular among data scientists, application developers and database administrators alike.

For users running MemSQL in a NUMA environment, MemSQL Ops now offers point-and-click installation. MemSQL Ops can intelligently map MemSQL instances to CPUs that share local memory. The increased efficiency on large server deployments can accelerate queries by up to 40%. From ultra-fast query execution to efficient storage of business data, MemSQL enables users to operate with maximum efficiency in fast-paced production environments.

Read more on the MemSQL blog:

Download insideBIGDATA: An Insider’s Guide to Apache Spark

Speak Your Mind