An Exciting Year for Spark

Apache Spark has had an amazing year, and the people behind the open source large-scale data processing engine have pulled some data to show just how fast it has grown in the last 12 months. Databricks, who spun out of AMPlab at UC Berkeley after creating Spark produced the infographic below that highlights some of the data, especially as it catches fire across the industry.

For those new to Spark, it is an open-source data analytics cluster computing framework. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce for certain applications. Spark provides primitives for in-memory cluster computing that allows user programs to load data into a cluster’s memory and query it repeatedly, making it well suited to machine learning algorithms.

Sign up for the free insideBIGDATA newsletter.

An Exciting Year for Spark

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Speak Your Mind Cancel reply

Featured RSS Feed

More News from insideHPC

An Exciting Year for Spark

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Join Us On Social Media

Speak Your Mind Cancel reply

Related Posts

Featured RSS Feed

More News from insideHPC