The Simply Statistics Blog hosted a web conference today, Oct. 30 – the inaugural Future of Statistics Unconference. The blog’s founders, Dr. Roger Peng and Dr. Jeff Leek (both Coursera instructors) brought together a dynamic panel of experts to discuss how the data revolution requires an equal revolution in statistical methods, software, education, and collaborations with natural sciences, social sciences, and industry. Some of the brightest minds in the data sciences participated: Hilary Mason, Hadley Wickham, Joe Blitzstein, Sinan Aral, Daniela Witten, Hongkai Ji
I attended the conference and was delighted with all the fresh ideas these experts brought forth. I’ll give you a synopsis of some of the highlights. Hadley Wickham started things off with a preview of what he’s been working on lately, namely the dplyr and ggvis R packages (he showed a very cool slider control inside of RStudio to control a plot). He also discussed two forms of data science: Cognitive (where you think about a given problem), and Computational (where you calculate the results). He also reminded us of this intriguing quote:
The future is already here, it is just not evenly distributed — William Gibson
Next up was Daniela Witten (one of the authors of my favorite book, Introduction to Statistical Learning with R) who talked about Prediction vs. Inference. She noted how we do prediction very well now with Amazon-like recommender systems, but we need to see more of inference in the context of machine learning to determine how certain we are about a prediction via confidence intervals and p-values. She ended with an observation of the differences between data scientists and statisticians, where the latter has more rigorous training in inference.
Joe Blitzstein from Harvard followed next by mentioning his very popular Stat110 Probability series of 34 lectures available for free on Youtube. He talked about how undergrad probability and inference courses tended to be just about Calculus where they should be focused on the stories data tell, e.g. each distribution has a story behind it. You can also check out his new Data Science course CS109. He finished with some comments about whether statistics grad students should be required to take Measure Theory and also the need to reconcile the often heated differences between Bayesians and Frequentists.
Hongkai Ji from Johns Hopkins was next with an informational discussion of his work with statistical methods in biology.
Next was Sinan Aral from MIT who discussed how Casual Inference was important and how big data has increased in granularity.
The final speaker was renowned data scientist and blogger Hilary Mason who admitted that she is not a statistician but rather a computer scientist. Her topic for the day was the intersection between business and statistics. She outlined the role of the typical data scientist – using mathematics, coding in a modeling language like R, in some cases building production systems as a software developer, and most importantly telling compelling stories in a comprehensible way to convey the results of data science projects. She stressed that there was a distinct difference between the code a statistician writes in an academic environment versus the code required in a business environment – different languages, team effort, version control, scalability, code reviews, etc.
The Unconference on the Statistics of the Future was, in my opinion, a huge success. I hope Simply Statistics will create a new tradition by offering future events of this kind. Please enjoy the conference at your own convenience: