Sign up for our newsletter and get the latest big data news and analysis.

Video: MapReduce for the Masses using Common Crawl

In this video, Steve Salevan from the Common Crawl Foundation demonstrates how to go from having no prior experience with scale data analysis to being able to play with 40TB of web crawl information in just five minutes.

Common Crawl aims to change the big data game with our repository of over 40 terabytes of high-quality web crawl information into the Amazon cloud, the net total of 5 billion crawled pages. In this blog post, we’ll show you how you can harness the power of MapReduce data analysis against the Common Crawl dataset with nothing more than five minutes of your time, a bit of local configuration, and 25 cents.

Read the Full Story.

Leave a Comment