Sign up for our newsletter and get the latest big data news and analysis.

Becoming a Data Scientist

Here is a compelling interview with data scientist Will Kurt, courtesy of the Becoming a Data Scientist Podcast series. Kurt talks about his path from English & Literature and Library & Information Science degrees to becoming the Lead Data Scientist at KISSmetrics.

Pedro Domingos: “The Master Algorithm” | Talks at Google

In the video presentation below from the Talks at Google series, Domingos discusses his perspectives on the state of machine learning. Pedro Domingos is a Professor of Computer Science and Engineering at the University of Washington.

Fraud Detection with Deep Learning at Paypal

There’s a lot of buzz surrounding machine learning and Deep Learning in particular. In this video presentation, Venkatatesh Ramanathan, talks about PayPal-Fraud Detection with H2O Deep Learning. He takes you behind the mystery of machine learning and introduces the fast experimental process a data scientist utilized in building a Deep Learning model.

Best Practices – Big Data Acceleration

“This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern HPC clusters. An overview of RDMA-based designs for multiple components of Hadoop (HDFS, MapReduce, RPC and HBase), Spark, and Memcached will be presented. Enhanced designs for these components to exploit in-memory technology and parallel file systems (such as Lustre) will be presented. Benefits of these designs on various cluster configurations using the publicly available RDMA-enabled packages from the OSU HiBD project ( will be shown.”

Scaling Deep Learning

Deep Learning is a relatively new area of Machine Learning research which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. The video presentation below is from the 2016 Stanford HPC Conference, where Brian Catanzaro from Baidu presents: “Scaling Deep Learning.”

Introduction to GPU Computing

As the use of GPUs continues to rise in fields like deep learning, we thought it would be useful to readers not yet familiar with this technology to offer the “Introduction to GPU Computing” presentation below.

What has Kaggle Learned from 2 Million Machine Learning Models?

I was excited to attend a very compelling Meetup featuring Anthony Goldbloom, co-founder and CEO of Kaggle who talked about the genesis of his company and what they’ve learned along the way – “What has Kaggle Learned from 2 Million Machine Learning Model?” It was fascinating!

Talend Releases Free, Easy-to-Use Desktop App for Quickly Preparing Data for Analysis

Talend, a global leader in big data integration software, today introduced Talend Data Preparation, a self-service application that enables business users to simplify and expedite the often laborious and time consuming process of data wrangling or the data manipulation and analysis tasks that are often performed using spreadsheets.

Analytics Plus the Internet of Things Add up to Grid Reliability

SAS and OSIsoft are showcasing how predictive analytics from SAS and infrastructure-management software from OSIsoft can transform asset data from IoT connected devices into an optimized grid with the Salt River Project powered by SAS.

Architecting Predictive Algorithms for Machine Learning

In the presentation below, Seth Juarez of DevExpress discusses architecting predictive algorithms for machine learning.