A fantastic new book just landed on my desk, “The Book of R: A First Course in Programming and Statistics” by Tilman M. Davies from No Starch Press. I’ve been looking for a book like this for some time – to use with the introductory data science and machine learning course I teach. It fills some holes in my course content that my own book doesn’t address.
I needed a solid book to recommend to my students to get a good foundation in the R language, along with probability and statistics. In the past, another No Starch Press title “The Art of R Programming” by Norman Matloff was my “go to” book (a great book written by a computer science professor), but the Davies text helps fill my needs more completely. Specifically, I like Part III – Statistics and Probability, and Part IV – Statistical Testing and Modeling. These two sections have a number of chapters that serve the need of all data scientists – the early part of the data science process where you’re getting comfortable and gaining familiarity with the data set. Chapter 16 does a good job at describing probability distributions. Chapters 17-19 are superb for Sampling Distributions and Confidence, Hypothesis Testing, and Analysis of Variance. The great thing about this book is it doesn’t shy away from mathematics. It’s always been my opinion that in order to excel in data science and machine learning, you must be conversant in the mathematical foundations. The Davies book is a great way for you to start along this path.
Here is a list of “Parts” contained in the book. Each part has 4-8 chapters:
Part I: The Language
Part II: Programming
Part III: Statistics and Probability
Part IV: Statistical Testing and Modeling
Part V: Advanced Graphics
Tilman M. Davies is a lecturer at the University of Otago in New Zealand, where he teaches statistics and R at all university levels. He has been programming in R for 10 years and uses it in all of his courses.
The book starts with the basics, such as how to handle data sets and write simple R programs, before moving on to more advanced topics, like producing statistical summaries of your data and performing statistical tests and modeling. You’ll also learn how to create impressive data visualizations with R’s base graphics tools as well as through more advanced packages like ggplot2 and ggvis.
Davies includes hundreds of examples and hands-on exercises that take you from theory to practice. You can go to the book’s website to download all R code examples and solutions to the exercises. Here is a summary of what you’ll learn:
- The fundamentals of programming in R, including how to create data frames, write functions, and use variables, statements, and loops
- Statistical concepts like exploratory data analysis, probabilities, hypothesis tests, and regression modeling, and how to run them in R
- How to access R’s thousands of functions, packages, and data sets
- How to gain insights from your data
- How to create publication-quality data visualizations of your results
My only complaint about this book is its size! At 832 pages, this tome will take up valuable real estate on any data scientist’s desk (although it is available in Kindle electronic format). It’s also going to weigh down my instructor’s book bag when I’m on-site teaching my data science and machine learning classes. But I think these are small prices to pay for a quality text for learning the R environment.
Contributed by: Daniel D. Gutierrez, Managing Editor of insideBIGDATA. He is also a practicing data scientist through his consultancy AMULET Analytics. Daniel is also an educator and author with his latest title “Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R.” Contact me at: daniel@insideBIGDATA.com
Sign up for the free insideBIGDATA newsletter.