## How to Use the Mann-Kendall Test to Assess Cloud Costs

In this contributed article, Vadim Solovey, Chief Technology Officer at DoiT International, indicates that while there are many commercial, off-the-shelf solutions to track cloud costs, a lot of organizations find it helpful to craft their own customized approach. One of the most effective tools to accomplish this is the Mann-Kendall test, which has proven highly effective at separating the signal from the noise in environments generating multiple time series data feeds.

## Circular Statistics in Python: An Intuitive Intro

In this contributed article, Amit Babayoff, a data scientist at Deeyook, discusses the principles of circular statistics, by looking at some its basic principles and tools and why conventional linear methods don’t work well on circular data. She also explores how a simple filtering for handling noise can be constructed from these basic tools.

## Book Review: Bayesian Statistics the Fun Way by Will Kurt

“Bayesian Statistics the Fun Way: Understanding Statistics and Probability with Star Wars, Lego, and Rubber Ducks,” by Will Kurt (2019 No Starch Press) is an excellent introduction to subjects critical to all data scientists. Will Kurt, in fact, is a data scientist! I always advise my data science classes at UCLA to engage these important subjects in order to obtain a well-rounded exposure to disciplines upon which data science is based. I’ve already added this title to my official bibliography of learning resources given to my students.

## Book Review: The Art of Statistics – How to Learn from Data by David Spiegelhalter

This recent title, “The Art of Statistics – How to Learn from Data,” by University of Cambridge statistician David Spiegalhalter, is an important book on a number of fronts. I particularly appreciated the topics covered in the book that touch on important parts of the Data Science Process: data visualization, linear regression, logarithmic scales, Pierson correlation coefficient, data distributions, logistic regression, ROC curves, classification trees, over-fitting, bootstrap, probability theory, probability distributions, Bayes theory, and much more. I think new data scientists should engage a gentle introduction of these topics before diving into mathematical theory and code.

## Introduction to Statistical Analysis and Outlier Detection Methods

Our friends over at Noah Data have written a research style paper, “Introduction to Statistical Analysis and Outlier Detection Methods,” that discusses how statistical data can generally be classified in terms of number of variables as Univariate, Bivariate or Multivariate. Univariate data has only one variable, Bivariate data has two variables and Multivariate data has more than two variables.

## Introduction to Statistical Analysis and Outlier Detection Methods

Our friends over at Noah Data have written a research style paper, Introduction to Statistical Analysis and Outlier Detection Methods, that discusses how statistical data can generally be classified in terms of number of variables as Univariate, Bivariate or Multivariate. Univariate data has only one variable, Bivariate data has two variables and Multivariate data has […]

## Statistics and Machine Learning at Scale: New Technologies Apply Machine Learning to Big Data

In a new white paper, based on presentations given over the last few years. Wayne Thompson, Manager of Data Science Technologies at SAS, introduces key machine learning concepts, explains the correlation between statistics and machine learning, and describes SAS solutions that enable machine learning at scale. Download the report to lean more about how new machine learning technologies are being applied to big data.

## Book Review: Why – A Guide to Finding and Using Causes

A new book, “ Why: A Guide to Finding and Using Causes ,” by Stevens Institute of Technology assistant professor of computer science Samantha Kleinberg is a necessary addition to any data scientist’s bookshelf as it helps bring focus to the dreaded “correlation does not imply causation” conundrum that affects our understanding of data-centric problems.