Sign up for our newsletter and get the latest big data news and analysis.

Altiscale Announces Apache Spark on the Altiscale Data Cloud

altiscale_logo

Altiscale, Inc., a leading provider of Hadoop-as-a-Service, today announced that Apache Spark is now available on the Altiscale Data Cloud. Altiscale customers can now leverage Apache Spark on Apache Hadoop in order to achieve their critical analytical and business objectives.

Data Science 101: What’s Coming for Spark in 2015

Spark_logo_feature

Apache Spark took the data science world by storm in 2014 as a technology foundation for big data applications. In the talk below from the Bay Area Spark User Meetup, Patrick Wendell from Databricks speaks about new developments in Spark and identifies areas of focus in the coming year.

GPU Accelerated Platforms for Deep Learning

sumit

“NVIDIA will present an update on accelerated computing, in particular, the latest de- velopments in the platform. They will touch upon NVLink, OpenPOWER, ARM64, and new software updates and also cover the broad-sweeping impact that a new field of machine learning, called Deep Learning, is having on applications and domains.”

Data Science 101: Machine Learning – The Basics

The next installment of insideBIGDATA’s Data Science 101 series comes from our friends over at LinkedIn.

Data Science 101: Random Forests

Machine_Learning

The Random forests machine learning algorithm is a popular ensemble method used by many data scientists to achieve good predictive performance in the classification regime. Fully understanding the nuances of this statistical learning technique is paramount to getting the most out of this algorithm – unfortunately, this means math. The presentation below is from machine learning course CPSC 540 at The University of British Columbia,

Dr. Max Kuhn Interviewed at useR! Conference

Data Science

In the presentation below, data scientist, author (“Applied Predictive Modeling” with Kjell Johnson) and R caret package developer Max Kuhn sits down for an in-depth interview with Eduardo Arino de la Rubia sponsored by our friends over at DataScience.LA. They discuss the art and science of predictive modeling in the real world, the multifaceted and […]

Data Science 101: Using Statistics to Predict AB Testing

Slide1

The talk below presents simple methods that can accurately predict future performance from AB test results, and that allow you to determine the smallest acceptable sample size. Using four years of AB testing data, you’ll see how these methods really work.

Data Science 101: Lessons Learned from Kaggle Competitions

kaggle_monster

In the video presentation below, “Machine learning best practices we’ve learned from hundreds of competitions,” Ben Hamner, Chief Scientist at Kaggle, discusses some very intriguing insights into how find success in data science projects.

Confessions of a Recovering Data Broker

ToServeMan

Do data brokers act to serve man? Decide for yourself. The full title of the talk below is “Confessions of a Recovering Data Broker: Responsible Innovation in the Age of Big Data, Big Brother, and the Coming Skynet Terminators.” The presenter is Jim Adler, VP of Products, Metanautix.

The Rise of Data Science in the Age of Big Data Analytics

Data Science is the key to unlocking insight from Big Data: by combining computer science skills with statistical analysis and a deep understanding of the data and problem we can not only make better predictions, but also fill in gaps in our knowledge, and even find answers to questions we hadn’t even thought of yet.