Sign up for our newsletter and get the latest big data news and analysis.

Data Science 101: Hadoop, Pig and Twitter

Here is a well-crafted slideshare presentation “Hadoop, Pig and Twitter” by Kevin Weil, Analytics Lead at Twitter.

Data Science 101: Introduction to MapReduce


For newbie data scientists and enterprise decision makers who need a quick way to get up to speed with MapReduce, the technology underlying Hadoop, here is a slide presentation “Introduction to MapReduce: an Abstraction for Large-Scale Computation” by Ilan Horn of Google

Data Science 101: Metaprogramming Python for Big Data


The video presentation below comes from our friends at the San Francisco Python Meetup group. The talk discusses how AdRoll uses Python to squeeze every last bit of performance out of a single high-end server for the purpose of interactive analysis of terabyte-scale data sets.

Quantum Machine Learning


Ever wonder what will happen when exabyte data stores are the norm, and even the parallelism of Hadoop can no longer provide the necessary processing power to address the data deluge? Quantum computing may hold the answer.

Data Science 101: Data Agnosticism

Bits are bits. Whether you are searching for whales in audio clips or trying to predict hospitalization rates based on insurance claims, the process is the same: clean the data, generate features, build a model, and iterate.

Data Science 101: k-means Clustering

In this edition of insideBIGDATA’s Data Science 101 series, I’m going to offer up a short instructional video describing the use of the popular unsupervised learning algorithm, k-means clustering.

Data Science 101: The Data Analytics Handbook

“Data Analytics Handbook” is a new resource meant to inform young professionals about the field of data science. Written by a group of students at UC Berkeley: Brian Liou, Tristan Tao, and Elizabeth Lin. Edition One of the book includes in-depth interviews with Data Scientists & Data Analysts.

Extending the R Language to the Enterprise


Earlier this week, I attended a very informative event sponsored by the LA RUG (Los Angeles R User Group meetup) that featured the topic “Extending the R language to the enterprise with TERR & Spotfire.”

Data Science 101: Hadoop in the Cloud


Amazon Elastic MapReduce (Amazon EMR) makes it easy to provision and manage Hadoop in the AWS Cloud. Hadoop is available in multiple distributions and Amazon EMR gives you the option of using the Amazon Distribution or the MapR Distribution for Hadoop.

Data Science 101: Forecasting Time Series Using R

An integral tool found in data science is Time Series Forecasting. Here is a useful instructional video on the subject from one of the authors of a free eBook available on OTexts – “Forecasting: Principles and Practice.” The presentation “Forecasting Time Series Using R” is made by Professor of Statistics Rob J Hyndman.