Sign up for our newsletter and get the latest big data news and analysis.

Databricks Announces General Availability of Its Cloud Platform

databricks_logo_NEWSPARK SUMMIT 2015 NEWS

Databricks, the company behind Apache Spark, today announced the general availability of its cloud-hosted data platform (formerly known as Databricks Cloud). The Databricks platform makes it easy to turn data into value, from ingest to production, without the hassle of managing complex infrastructure, systems and tools.

Today’s data scientists, data engineers and developers need to cobble together various complex infrastructure, tools and systems to meet their day to day data needs, severely inhibiting their ability to generate business value quickly. By combining the power of Spark with a zero-management hosted platform, Databricks removes this critical bottleneck, enabling these data professionals to focus on finding answers from their data instantly and to build value creating data products.

We’re beyond thrilled to bring Databricks to the masses. Apache Spark has come a long way since its inception at UC Berkeley and we’re enthused to make it available to thousands of organizations. We’ve been working closely with our customers to help them get the most out of their deployments and are eager to bring Spark’s power, ease-of-use, speed and flexibility to organizations that have big plans for their equally big data,” said Ion Stoica, CEO, Databricks.

Databricks Features

Following the general availability announcement, Databricks will unveil an exciting range of new features planned for the second half of this year at Spark Summit in San Francisco including:

  • R-language notebooks: Analyze large-scale data sets using R in the Databricks environment.
  • Access control and private notebooks: Manage permissions to view and execute code at an individual level.
  • Version control: Track changes to source code in the Databricks platform.
  • Spark streaming support: Enabling fault-tolerant real-time processing.

Databricks is available as a hosted platform on Amazon Web Services with a monthly subscription.

 

 

We use Databricks to speed up prototyping our machine learning pipeline development. We’re looking at hard problems like progressively improving matches between voter records and other representations of people through machine-learning, and before Databricks, it took roughly three times as long,” said Andy Barkett, CEO, GetExp. “We appreciate that we can quickly pull data from a variety of sources, including relational databases, flat files, and JSON-stores.”

Apache Spark Momentum and 1.4 Release

The general availability of Spark 1.4 was also announced last week. Spark 1.4 will be the largest Spark release to date with more than 220 contributors and 1,200 commits. Spark 1.4 introduces a new R language API (SparkR) and adds new features in Spark’s core engine and all standard libraries. Spark 1.4 also boasts a large number of new features in this release including:

  • Expansion of Spark’s Dataframe APIs: window functions, statistical and mathematical functions, support for missing data.
  • Machine learning pipelines API graduates from alpha (adds feature parity in Python and stable API for developers)
  • Added UI visualizations for debugging and monitoring programs (interactive event timeline for jobs, DAG visualization, visual monitoring for Spark Streaming).

Powering great dining experiences requires us to analyze many different kinds of data across many machines. Databricks enables us to rapidly iterate from ideas to data-driven insights and new features. In particular, we have leveraged Databricks with Spark’s MLlib to build out machine learning models that provide personalized restaurant recommendations and help diners discover the perfect restaurant for their occasion. Spark’s combination of speed, flexibility, and access to machine learning out of the box enables us to innovate faster,” said Jeremy Schiff, Senior Data Science Manager, OpenTable.

While Spark is the most active open source project in the big data ecosystem with over 500 contributors, Databricks is committed to Apache Spark and continues to lead the community of contributors. Spark has been adopted by a number of platform vendors, including all of the major Hadoop distributors. As both the creators of Spark and the company leading its evolution to be enterprise-ready, Databricks has contributed over 75 percent of the code added to Spark the end of 2014 alone. To learn more about Spark 1.4, read the Databricks blog posts here: http://databricks.com/blog/2015/06/11/announcing-apache-spark-1-4.html

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: