Sign up for our newsletter and get the latest big data news and analysis.

Data Science 101: Machine Learning – The Basics

The next installment of insideBIGDATA’s Data Science 101 series comes from our friends over at LinkedIn.

Data Science 101: Random Forests

Machine_Learning

The Random forests machine learning algorithm is a popular ensemble method used by many data scientists to achieve good predictive performance in the classification regime. Fully understanding the nuances of this statistical learning technique is paramount to getting the most out of this algorithm – unfortunately, this means math. The presentation below is from machine learning course CPSC 540 at The University of British Columbia,

Data Science 101: Using Statistics to Predict AB Testing

Slide1

The talk below presents simple methods that can accurately predict future performance from AB test results, and that allow you to determine the smallest acceptable sample size. Using four years of AB testing data, you’ll see how these methods really work.

Data Science 101: Lessons Learned from Kaggle Competitions

kaggle_monster

In the video presentation below, “Machine learning best practices we’ve learned from hundreds of competitions,” Ben Hamner, Chief Scientist at Kaggle, discusses some very intriguing insights into how find success in data science projects.

Data Science 101: Expressing Yourself in R

Brought to you by our friends over at the Stanford Center for Professional Development is this compelling data science education resource: “Expressing yourself in R” – by Hadley Wickham, Rice University.

Data Science 101: Cassandra Tutorial for Beginners

Provided by our friends over at Edureka, Module 1 of their Apache Cassandra course below discusses the fundamental concepts of using a highly-scalable, column-oriented database to implement appropriate use cases.

Data Science 101: Support Vector Machines

Support Vector Machines (SVM) is an important and widely used machine learning algorithm. In order to fully understand SVMs, you need to have a fundamental understanding of how the statistical learning method functions. Here is a useful lecture on SVM coming from MIT OpenCourseware.

Deep Learning, Self-Taught Learning and Unsupervised Feature Learning

The video presentation below is a highly compelling talk by Stanford University professor and Coursera co-founder, Dr. Andrew Ng. Andrew addresses a graduate summer school audience at UCLA’s IPAM (Institute for Pure & Applied Mathematics) on the topic – Deep Learning, Feature Learning.

Data Science 101: Mining Big Data with Apache Spark

Mining Big Data can be an incredibly frustrating experience due to its inherent complexity and a lack of tools.

Data Science 101: Data Agnosticism – Feature Engineering Without Domain Expertise

From the SciPy2013 conference, here is a compelling talk “Data Agnosticism: Feature Engineering Without Domain Expertise” by Nicholas Kridler of Accretive Health in Chicago.