Spark is a fast and general engine for large-scale data processing. First technology specifically designed for large-scale parallel computation was Hadoop with HDFS and MapReduce. Now, with many on-going initiatives to make Spark the next MapReduce, there seems to be a lot going on for this project that sprung from humble origins at the Amp Lab at UC Berkeley. With the founding of Databricks to support Spark, this open source top level Apache project has stepped into the limelight to garner it’s share of fans and critics.
The panel discussion video below comes from the Los Angeles Spark Users Group, organized by Subash DSouza. The talk fosters a lively discussion on Spark’s initial goals, where it came from and what the future holds for Spark. Many leading Big Data vendors are responding by introducing Spark’s capabilities into their architectures. The panel discussion is between the top Hadoop distribution vendors – Cloudera, MapR, and Pivotal – they impart their vision, strategy, and capabilities around Apache Spark. The presentation is a rare opportunity to see these leading vendors on one panel, hear from their experts, and get their insight into best practices, real-life use cases, and solutions around Spark implementation.
As moderator of the panel, I was impressed with the quality and insightful responses to the questions that the panelists provided. As Spark is a moving target, it is important for early adopter of this technology to hear first-hand what the primary vendors think. Panels such as this provide a valuable platform for this level of analysis. It was a very productive evening!
Sandy Ryza – Data Scientist at Cloudera – Sandy Ryza is a data scientist at Cloudera. He recently led Cloudera’s Spark development and still contributes actively to the project. Prior to Spark, he worked on MapReduce and YARN, and is a member of the Hadoop Project Management Committee.
Sungwook Yoon, Data Scientist, MapR Technologies – Sungwook is a Data Scientist at MapR. Sungwook’s data experience includes malware detection algorithms for packet stream analysis, mobile network signaling analysis, social network analysis, job title analysis as well as call center data analysis.
Gautam Muralidhar, Sr. Data Scientist, Pivotal – Gautam currently works as a Sr. Data Scientist at Pivotal, where he helps customers derive actionable insights from big data by solving machine learning challenges using state of the art analytics infrastructure and tools from Pivotal’s stack. Gautam holds a Ph.D degree from The University of Texas at Austin and his dissertation work spanned the areas of computer vision, machine learning, and medical imaging.
Panel Moderator: Daniel Gutierrez, Managing Editor – insideBIGDATA.
Sign up for the free insideBIGDATA newsletter.