Data Science 101: Mining Big Data with Apache Spark

Mining Big Data can be an incredibly frustrating experience due to its inherent complexity and a lack of tools. Reynold Xin and Aaron Davidson are Committers and PMC Members for Apache Spark and use the framework to mine big data at Databricks. In this presentation and interactive demo, you’ll learn about data mining workflows, the architecture and benefits of Spark, as well as practical use cases for the framework.

Dubbed the leading successor to Hadoop MapReduce, Apache Spark is a cluster compute system that makes data analytics fast — both fast to run and fast to write. Programs written in Spark can often outperform those in MapReduce by up to 100X, while being 10X shorter and more understandable. In addition, Spark also provides efficient support for streaming, query execution, machine learning, and graph computation through rich high level libraries. Last but not least, the project features one of the most active open source community in Big Data: 190+ developers from 50+ organizations have contributed code to the project.

This talk was given at the SF Data Mining Meetup group in San Francisco. The main speaker is Reynold Xin, a committer on Apache Spark and a co-founder of Databricks. Prior to Databricks, he was pursuing a PhD in the UC Berkeley AMPLab.

Earn your master’s in predictive analytics completely online from Northwestern University.

Data Science 101: Mining Big Data with Apache Spark

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Speak Your Mind Cancel reply

Featured RSS Feed

More News from insideHPC

Data Science 101: Mining Big Data with Apache Spark

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Join Us On Social Media

Speak Your Mind Cancel reply

Related Posts

Featured RSS Feed

More News from insideHPC