Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming

As Apache Spark becomes more widely adopted, the focus has been on creating higher-level APIs that provide increased opportunities for automatic optimization. In the talk below, Michael Armbrust, gives an overview of some of the exciting new API’s available in Spark 2.0, namely Datasets and Structured Streaming. Together, these APIs are bringing the power of Catalyst, Spark SQL’s query optimizer, to all users of Spark. He focuses on specific examples of how developers can build their analyses more quickly and efficiently simply by providing Spark with more information about what they are trying to accomplish.

Michael Armbrust is the lead developer of the Spark SQL project at Databricks. He received his PhD from UC Berkeley in 2013, and was advised by Michael Franklin, David Patterson, and Armando Fox. His thesis focused on building systems that allow developers to rapidly build scalable interactive applications, and specifically defined the notion of scale independence. His interests broadly include distributed systems, large-scale structured storage and query optimization.

For our reader’s convenience, please find the slides for Michael’s presentation HERE.

Sign up for the free insideBIGDATA newsletter.

Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Speak Your Mind Cancel reply

Featured RSS Feed

More News from insideHPC

Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Join Us On Social Media

Speak Your Mind Cancel reply

Related Posts

Featured RSS Feed

More News from insideHPC