Sign up for our newsletter and get the latest big data news and analysis.

Databricks Becomes the First Vendor to Provide Support for Apache® Spark™ 2.0 on Its Just-in-Time Data Platform

databricks_logo_NEWDatabricks, the company founded by the team that created Apache® Spark™, today announced that Apache Spark 2.0 is generally available on its just-in-time data platform, making it the first vendor to offer Apache Spark 2.0 support. With major contributions from Databricks and the Spark community, this is the first major release of open source Spark since Spark 1.6 in 2015. Databricks customers can now immediately benefit from Spark 2.0’s three core attributes — easier, faster, and smarter.

Since the release of Spark 1.0, we’ve spent countless hours listening to members of the Spark community and Databricks users to learn from a mix of praises and complaints. Spark 2.0 builds on what the community has learned, doubling down on what users love and improving on what users lament,” said Databricks’ Chief Architect and Cofounder, Reynold Xin.

Among other major improvements as outlined in the Databricks blog post, the most notable features of Apache Spark 2.0 are:

  • Speed: Gaining huge performance in orders of 5 to 10 times faster than Spark 1.6 for some Spark operators due to Tungsten’s Phase 2 whole-stage-code generation and Catalyst’s code optimization;
  • Simplicity: Unifying developer APIs across Spark’s libraries such as DataFrames and Datasets;
  • Structured Streaming: Laying the foundation for continuous applications by providing high-level declarative streaming APIs based on DataFrames and Datasets built atop Spark SQL engine that works on real-time data;
  • Machine Learning Model Persistence: Saving and loading pipelines and models across all programming languages supported by Spark;
  • DataFrame-based Machine Learning APIs: Emerging as the primary MLlib package with its “pipeline” APIs and focusing future developments on DataFrame-based API;
  • Standard SQL Support: Expanding Spark’s SQL capabilities for SQL:2003 features, introducing new ANSI SQL parser, and supporting scalar and predicate type subqueries.

One of the things that’s really exciting for me as a developer of Apache Spark is seeing how quickly users start to use new features and APIs we introduce, and in turn, offer almost instantaneous feedback, so that we can continue to improve them,” said Matei Zaharia, CTO and co-founder of Databricks and creator of Apache Spark.

For Databricks users, immediate access to Apache Spark 2.0 to create new clusters is as simple as selecting the release from its menu — all completed with a few clicks. Spark 2.0 is highly compatible with Spark 1.6, so migrating code should require minimal effort.

By making Spark 2.0 instantly accessible within a fully managed data platform, Databricks affords its users a full suite of tools to harness the open source 2.0 release advancements and ensure end-to-end security, giving data scientists and data engineers the easiest way to analyze data, perform advanced analytics, and deploy Spark applications.

Spark is becoming a staple for enterprise big data strategies with its speed and simplicity. Deploying the Apache Spark 2.0 release through Databricks’ platform enables businesses to translate Spark’s innovations into a competitive edge faster, while getting support from the people who are core to the Apache project,” said Tony Baer, Principal Analyst at Ovum.

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: