Statistics for Hackers

Print Friendly, PDF & Email

Statistics is a core ingredient of machine learning, but the field has a reputation for being difficult to crack: it revolves around a seemingly endless jargon of distributions, test statistics, confidence intervals, p-values, and more, with each concept subject to its own subtle assumptions. But it doesn’t have to be this way!

Today we have access to computers that Neyman and Pearson could only dream of, and many of the conceptual challenges in the field can be overcome through judicious use of these CPU cycles. In this slide deck presentation below, Jake VanderPlas, discusses how you can use your coding skills to “hack statistics” – to replace some of the theory and jargon with intuitive computational approaches such as sampling, shuffling, cross-validation, and Bayesian methods – and show that with a grasp of just a few fundamental concepts, if you can write a for-loop you can do statistical analysis.


Download insideBIGDATA: An Insider’s Guide to Apache Spark

Speak Your Mind