Sign up for our newsletter and get the latest big data news and analysis.

Book Review: Python Data Science Handbook

I recently had a need for a Python language resource to supplement a series of courses on Deep Learning I was evaluating that depended on this widely used language. As a long-time data science practitioner, my language of choice has been R, so I relished the opportunity to dig into Python to see first hand how the other side of the data science world did machine learning. I quickly evaluated a few texts that would become a helpful resource. The book I settled on was “Python Data Science Handbook: Essential Tools for Working with Data” by Jake VanderPlas. This O’Reilly book from November 2016 did not disappoint.

I found the approach taken with the book to be exactly what I was looking for – an introduction to the Python language, along with how to do machine learning with Python based tools. The book only has 5 chapters, but at 529 pages, each chapter is rather deep. Here is a list of chapters:

Chapter 1 – IPython: Beyond Normal Python

Chapter 2 – Introduction to NumPy

Chapter 3 – Data Manipulation with Pandas

Chapter 4 – Visualization with Matplotlib

Chapter 5 – Machine Learning

Nice! Short and sweet. I enjoyed all of the chapters in how they quickly get the reader up to speed. I liked the fact that the book didn’t get to scikit-learn until the last chapter of the book. I think most readers who need to apply Python data techniques at work will find that the topics covered in the early chapters are really essential. Further, I think most wouldn’t be nearly as productive if they had just jumped straight to the content on scikit-learn. The author does an excellent job covering broad terrain with enough detail that you are able to apply it to your problems. You will find yourself going back to use this book as a reference as I have. This is no-nonsense book and I found that it goes deep into material which is relevant and important to do data science in Python. Every page is rich in information and provides practical use case examples, optimization tricks and adds new dimensions to your understanding of topic.

The book is ideally suited to those that already know the basics of the Python language or already know how to program in another language like R or Julia and want to learn how to use Python for data science. Even if you already know Python and use it for simple data analysis tasks you could still find some useful gems in this book in the form of very clear examples of supervised and unsupervised machine learning.

I think this book is well suited to address the needs of the entire Data Science Process, from getting the data, exploring the data, modeling the data and communicating/visualizing the results. I’ve already updated the slides in my “data science” presentation that I use for conferences to include this book as a good learning resource.

The full text of the book is available HERE. Also available are Jupyter Notebooks for the Python Data Science Handbook.

 

Contributed by Daniel D. Gutierrez, Managing Editor of insideBIGDATA. In addition to being a tech journalist, Daniel also is a practicing data scientist, author, educator and sits on a number of advisory boards for various start-up companies. 

 

Sign up for the free insideBIGDATA newsletter.

Comments

  1. Chuck Emary says:

    Hi,
    Any idea how well Python scales from a performance perspective? It would seem that one would want to choose an interpreted language that had extensive VM optimizations to speed things up, especially for data manipulation and number crunching.

    Cheers…

Leave a Comment

*

Resource Links: