Sign up for our newsletter and get the latest big data news and analysis.

Top 5 Mistakes When Writing Spark Applications

In the presentation below from Spark Summit 2016, Mark Grover goes over the top 5 things that he’s seen in the field that prevent people from getting the most out of their Spark clusters. When some of these issues are addressed, it is not uncommon to see the same job running 10x or 100x faster with the same clusters, the same data, just a different approach.

The Data Scientist’s Guide to Apache Spark

Looking to dive deeper into the more cutting edge machine learning use cases in Apache Spark? To successfully use Spark’s advanced analytics capabilities including large scale machine learning and graph analysis, check out The Data Scientist’s Guide to Apache Spark, from our friends over at Databricks.

Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming

In the talk below, Michael Armbrust, gives an overview of some of the exciting new API’s available in Spark 2.0, namely Datasets and Structured Streaming. Together, these APIs are bringing the power of Catalyst, Spark SQL’s query optimizer, to all users of Spark.

Apache Spark MLlib 2.0 Preview: Data Science and Production

From the recent Spark Summit 2016 in San Francisco, the video presentation below by Joseph K. Bradley of Databricks give focus to “Apache Spark MLlib 2.0 Preview: Data Science and Production.”

Large-Scale Deep Learning with TensorFlow

We bring you the keynote presentation below from the recent Spark Summit 2016 held in San Francisco on June 6-8. Speaker Jeff Dean joined Google in 1999 and is currently a Google Senior Fellow.

Spark MLlib: Making Practical Machine Learning Easy and Scalable

In this talk, Xiangrui Meng of Databricks shares his experience in developing MLlib. The talk covers both higher-level APIs, ML pipelines, that make MLlib easy to use, as well as lower-level optimizations that make MLlib scale to massive data sets.

Advanced Apache Spark

Big data is going Spark crazy! Here’s a whopping 6 hour intensive, fast-paced and vendor agnostic look at Spark Core presented by Sameer Farooqui, a client services engineer at Databricks.

Apache Spark is the Smartphone of Big Data

In this special guest feature, Denny Lee of Databricks, talks about the versatility of Spark – essentially comparing it to the Swiss Army Knife of on your camping tri​p, called​ Big Data/Analytics.

Introduction to Spark Webinar

This Introduction to SPARK webinar will feature Daniel Gutierrez, Managing Editor of insideBIGDATA.

In the past year, the Apache Spark distributed computing architecture has continued its upward trajectory amongst the big data players. Its growth has been fueled by several innovative differentiators for big data applications, such as MapReduce 2.0 (or YARN), provisions for analytic workflows, and efficient use of memory. Databricks’ recent 2015 Spark industry survey reports that Spark adoption is outpacing Hadoop because of its accelerated access to big data. In support of this new computing architecture.

Spark 101: MapR Free On-Demand Training Now Includes Apache Spark

MapR Technologies, Inc., provider of a leading distribution for Apache™ Hadoop® that integrates web-scale enterprise storage and real-time database capabilities, announced the availability of the first free Apache Spark course as part of a new series in its Hadoop On-Demand Training program.