The Random forests machine learning algorithm is a popular ensemble method used by many data scientists to achieve good predictive performance in the classification regime. Fully understanding the nuances of this statistical learning technique is paramount to getting the most out of this algorithm – unfortunately, this means math. The presentation below is from machine learning course CPSC 540 at The University of British Columbia,
In the presentation below, data scientist, author (“Applied Predictive Modeling” with Kjell Johnson) and R caret package developer Max Kuhn sits down for an in-depth interview with Eduardo Arino de la Rubia sponsored by our friends over at DataScience.LA. They discuss the art and science of predictive modeling in the real world, the multifaceted and […]
The Los Angeles data science Meetup scene is booming in large part due to the efforts of a local data scientist, Szilard Pafka. In the interview below, Szilard discusses his background in the field, the genesis of his many Meetup groups, the LA tech industry, and his plans to make his Meetups even more successful.
Data Science is the key to unlocking insight from Big Data: by combining computer science skills with statistical analysis and a deep understanding of the data and problem we can not only make better predictions, but also fill in gaps in our knowledge, and even find answers to questions we hadn’t even thought of yet.
In the thought-provoking video below, Professor Yann LeCun, Director of AI Research at Facebook, sat down for a fireside chat at December 2014’s edition of Data Driven NYC to discuss deep learning and the future of artificial intelligence.
Welcome back to our series of articles sponsored by Intel – “Ask a Data Scientist.” Once a week you’ll see reader submitted questions of varying levels of technical detail answered by a practicing data scientist – sometimes by me and other times by an Intel data scientist. This week’s question is from a reader who asks for an explanation of data leakage.