The next installment of insideBIGDATA’s Data Science 101 series comes from our friends over at LinkedIn.
The Random forests machine learning algorithm is a popular ensemble method used by many data scientists to achieve good predictive performance in the classification regime. Fully understanding the nuances of this statistical learning technique is paramount to getting the most out of this algorithm – unfortunately, this means math. The presentation below is from machine learning course CPSC 540 at The University of British Columbia,
Brought to you by our friends over at the Stanford Center for Professional Development is this compelling data science education resource: “Expressing yourself in R” – by Hadley Wickham, Rice University.
Provided by our friends over at Edureka, Module 1 of their Apache Cassandra course below discusses the fundamental concepts of using a highly-scalable, column-oriented database to implement appropriate use cases.
Support Vector Machines (SVM) is an important and widely used machine learning algorithm. In order to fully understand SVMs, you need to have a fundamental understanding of how the statistical learning method functions. Here is a useful lecture on SVM coming from MIT OpenCourseware.
The video presentation below is a highly compelling talk by Stanford University professor and Coursera co-founder, Dr. Andrew Ng. Andrew addresses a graduate summer school audience at UCLA’s IPAM (Institute for Pure & Applied Mathematics) on the topic – Deep Learning, Feature Learning.
Mining Big Data can be an incredibly frustrating experience due to its inherent complexity and a lack of tools.
From the SciPy2013 conference, here is a compelling talk “Data Agnosticism: Feature Engineering Without Domain Expertise” by Nicholas Kridler of Accretive Health in Chicago.