Video Highlights: Make Better Decisions with Data — with Dr. Allen Downey

In this video presentation, our good friend Jon Krohn, Co-Founder and Chief Data Scientist at the machine learning company Nebula, is joined by Dr. Allen Downey, renowned author and professor, who shares insights from his upcoming book ‘Probably Overthinking It,’ breaking down underused techniques like Survival Analysis, explaining common paradoxes, discussing the dynamic Overton Window, and how to be prepared for Black Swan events. Strap in for a data-driven journey that will help you learn how to make better decisions with data!

Power to the Data Report Podcast: The Math Behind the Models

Hello, and welcome to the “Power-to-the-Data Report” podcast where we cover timely topics of the day from throughout the Big Data ecosystem. I am your host Daniel Gutierrez from insideBIGDATA where I serve as Editor-in-Chief & Resident Data Scientist. Today’s topic is “The Math Behind the Models,” one of my favorite topics when I’m teaching my Introduction to Data Science class at UCLA. In the podcast, I’ll discuss how in the age of data-driven decision-making and artificial intelligence, the role of data scientists has become increasingly vital. However, to truly excel in this field, data scientists must possess a strong foundation in mathematics and statistics.

Top Data Science Ph.D. Dissertations (2019-2020)

The American Mathematical Society (AMS) recently published in its Notices monthly journal a long list of all the doctoral degrees conferred from July 1, 2019 to June 30, 2020 for mathematics and statistics. The degrees come from 242 departments in 186 universities in the U.S. I enjoy keeping a pulse on the research realm for my field, so I went through the entire published list and picked out 48 dissertations that have high relevance to data science, machine learning, AI and deep learning. The list below is organized alphabetically by state.

How to Use the Mann-Kendall Test to Assess Cloud Costs

In this contributed article, Vadim Solovey, Chief Technology Officer at DoiT International, indicates that while there are many commercial, off-the-shelf solutions to track cloud costs, a lot of organizations find it helpful to craft their own customized approach. One of the most effective tools to accomplish this is the Mann-Kendall test, which has proven highly effective at separating the signal from the noise in environments generating multiple time series data feeds.

Circular Statistics in Python: An Intuitive Intro

In this contributed article, Amit Babayoff, a data scientist at Deeyook, discusses the principles of circular statistics, by looking at some its basic principles and tools and why conventional linear methods don’t work well on circular data. She also explores how a simple filtering for handling noise can be constructed from these basic tools.

Book Review: Bayesian Statistics the Fun Way by Will Kurt

“Bayesian Statistics the Fun Way: Understanding Statistics and Probability with Star Wars, Lego, and Rubber Ducks,” by Will Kurt (2019 No Starch Press) is an excellent introduction to subjects critical to all data scientists. Will Kurt, in fact, is a data scientist! I always advise my data science classes at UCLA to engage these important subjects in order to obtain a well-rounded exposure to disciplines upon which data science is based. I’ve already added this title to my official bibliography of learning resources given to my students.

Book Review: The Art of Statistics – How to Learn from Data by David Spiegelhalter

This recent title, “The Art of Statistics – How to Learn from Data,” by University of Cambridge statistician David Spiegalhalter, is an important book on a number of fronts. I particularly appreciated the topics covered in the book that touch on important parts of the Data Science Process: data visualization, linear regression, logarithmic scales, Pierson correlation coefficient, data distributions, logistic regression, ROC curves, classification trees, over-fitting, bootstrap, probability theory, probability distributions, Bayes theory, and much more. I think new data scientists should engage a gentle introduction of these topics before diving into mathematical theory and code.

Introduction to Statistical Analysis and Outlier Detection Methods

Our friends over at Noah Data have written a research style paper, “Introduction to Statistical Analysis and Outlier Detection Methods,” that discusses how statistical data can generally be classified in terms of number of variables as Univariate, Bivariate or Multivariate. Univariate data has only one variable, Bivariate data has two variables and Multivariate data has more than two variables.

Introduction to Statistical Analysis and Outlier Detection Methods

Our friends over at Noah Data have written a research style paper, Introduction to Statistical Analysis and Outlier Detection Methods, that discusses how statistical data can generally be classified in terms of number of variables as Univariate, Bivariate or Multivariate. Univariate data has only one variable, Bivariate data has two variables and Multivariate data has […]

Statistics and Machine Learning at Scale: New Technologies Apply Machine Learning to Big Data

In a new white paper, based on presentations given over the last few years. Wayne Thompson, Manager of Data Science Technologies at SAS, introduces key machine learning concepts, explains the correlation between statistics and machine learning, and describes SAS solutions that enable machine learning at scale. Download the report to lean more about how new machine learning technologies are being applied to big data.