Sign up for our newsletter and get the latest big data news and analysis.

Book Review: The Art of Statistics – How to Learn from Data by David Spiegelhalter

This recent title, “The Art of Statistics – How to Learn from Data,” by University of Cambridge statistician David Spiegalhalter, is an important book on a number of fronts. First, it’s an excellent introduction to the subject for any lay person wanting to better understand how to interpret statistical results. It’s also a good substitute for the standard introductory statistics course which is often too focused on memorizing equations for statistical tests to be applied to data that has already been extracted. Instead, this book emphasizes the importance of clarifying questions, assumptions, and expectations at the outset, identifying the data that might help, and then knowing how to responsibly interpret result. And lastly, the book is a helpful adjunct for anyone working to transition to the field of data science. In fact, I’ve added the book to my “should read” bibliography for the Introduction to Data Science class I teach at UCLA.

We see statistics everywhere we look. In the current age of “big data,” large and complex data sets are used for everything from traffic monitoring to online advertising to leading-edge academic research fields like environmental sciences and genomics. Meanwhile, the mainstream media bombards us with mounds of statistics, often misconstrued or taken out of contest.

“David Spiegelhalter’s The Art of Statistics shines a light on how we can use the ever-growing deluge of data to improve our understanding of the world,” wrote Nature.

Author Spiegalhalter argues that the rise of data science means that a grasp of statistical literacy is more important now than ever. The book is an accessible introduction to statistical reasoning that starts with real-world problems such as wanting to know how many trees are on the planet, or determining the benefit of taking statin drugs.

The following chapters are included:

  1. Getting Things in Proportion: Categorical Data and Percentages
  2. Summarizing and Communicating Numbers. Lots of Numbers
  3. Why Are We Looking at Data Anyway? Population and Measurement
  4. What Causes What?
  5. Modeling Relationships Using Regression
  6. Algorithms, Analytics and Prediction
  7. How Sure Can We Be About What is going On? Estimates and Intervals
  8. Probability – the Languages of Uncertainty and Variability
  9. Putting Probability and Statistics Together
  10. Answering Questions and Claiming Discoveries
  11. Learning from Experience the Bayesian Way
  12. How Things Go Wrong
  13. How We can Do Statistics Better
  14. Conclusion

“So-called “big data” has almost all the problems of small data, and more” said Spiegelhalter.” It is big, of course, and often comprises all the data available rather than just a a sample, and it also tends to be messier as it is generally ‘found’ rather than collected for a specific purpose. Statistical science has had to adapt to deal with massive volumes of data, but also to deal with the systematic biases that are often present.”

By applying statistical insight to everything from drug trials to titanic historical tragedies to the crime sprees of serial killers – all without using any mathematics – readers learn how statistical thinking works. As Spiegelhalter walks you through a range of compelling and practical problems, he ultimately prepares you to make more informed decisions that can shape your – and our – future. These strategies in the book have implications for people working in business, medicine, and journalism, but they will also be illuminating to anyone who wants to make better sense of their finances and healthcare – or just see through misleading statistics they come across on social media. Armed with a clear understanding of statistical concepts, we can better question the numbers we encounter in our daily lives.

I particularly appreciated the topics covered in the book that touch on important parts of the Data Science Process: data visualization, linear regression, logarithmic scales, Pierson correlation coefficient, data distributions, logistic regression, ROC curves, classification trees, over-fitting, bootstrap, probability theory, probability distributions, Bayes theory, and much more. I think new data scientists should engage a gentle introduction of these topics before diving into mathematical theory and code.

I greatly enjoyed reading this book, and used a number of Spiegelhalter’s concepts in my data science classes in order to clarify important points that only a world-renowned statistician can convey. Highly recommended for all newbie data scientists!

Contributed by Daniel D. Gutierrez, Managing Editor and Resident Data Scientist for insideBIGDATA. In addition to being a tech journalist, Daniel also is a consultant in data scientist, author, educator and sits on a number of advisory boards for various start-up companies. 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: