Sign up for our newsletter and get the latest big data news and analysis.

Databricks Simplifies and Scales Deep Learning with New Apache Spark Library

Databricks, the company founded by the creators of the popular Apache Spark project, announced Deep Learning Pipelines, a new library to integrate and scale out deep learning in Apache Spark.

Prior to today, deep learning has been unapproachable for many because of the dependency on separate, low-level frameworks that require specialized skills. Furthermore, these frameworks do not scale well because they only run on a single node. Databricks is releasing Deep Learning Pipelines, an open source package that adds high-level, easy-to-use deep learning APIs for technologies such as TensorFlow to Apache Spark, making it possible for enterprises to scale deep learning across multiple nodes.

This is a huge step in furthering Databricks’ mission to democratize artificial intelligence and data science,” said Matei Zaharia, cofounder and chief technologist at Databricks. “This work has the potential to accomplish for deep learning what Spark did for big data, which is to make it approachable to a much broader audience, from data scientists to business analysts.”

The new Deep Learning Pipelines package provides users with the ability to:

  • Easily call deep learning libraries within existing Spark ML workflows, making it immediately available to Spark developers without having to learn a separate tool;
  • Seamlessly perform transfer learning of deep learning models via Spark MLlib Pipelines, combining the power of deep learning with Spark’s data processing and ML capabilities;
  • Leverage Spark’s distributed computation engine with the integration of TensorFlow™ and Keras to quickly train and productionize high quality models at scale;
  • Empower organizations to more broadly leverage AI through mechanisms that turn deep learning models into SQL functions for business and data analysts;
  • Work more easily with complex data such as images through a set of Spark-native utilities.

Deep Learning Pipelines for Apache Spark democratizes access to artificial intelligence in the enterprise by eliminating the barriers to deep learning and processing complex data at scale.

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: