Apache Spark took the data science world by storm in 2014 as a technology foundation for big data applications. In the talk below from the Bay Area Spark User Meetup, Patrick Wendell from Databricks speaks about new developments in Spark and identifies areas of focus in the coming year.
“NVIDIA will present an update on accelerated computing, in particular, the latest de- velopments in the platform. They will touch upon NVLink, OpenPOWER, ARM64, and new software updates and also cover the broad-sweeping impact that a new field of machine learning, called Deep Learning, is having on applications and domains.”
The next installment of insideBIGDATA’s Data Science 101 series comes from our friends over at LinkedIn.
More than 95 percent of companies have a formal data and analytics strategy in place with many favoring product development, IT and marketing over corporate real estate—until now. A new, independent study conducted by Forrester Consulting, commissioned by JLL, says that 75 percent of firms see corporate real estate information as a core part of a wider corporate data and analytics strategy.
The Random forests machine learning algorithm is a popular ensemble method used by many data scientists to achieve good predictive performance in the classification regime. Fully understanding the nuances of this statistical learning technique is paramount to getting the most out of this algorithm – unfortunately, this means math. The presentation below is from machine learning course CPSC 540 at The University of British Columbia,