Video: MapReduce for the Masses using Common Crawl

Print Friendly, PDF & Email

In this video, Steve Salevan from the Common Crawl Foundation demonstrates how to go from having no prior experience with scale data analysis to being able to play with 40TB of web crawl information in just five minutes.

Common Crawl aims to change the big data game with our repository of over 40 terabytes of high-quality web crawl information into the Amazon cloud, the net total of 5 billion crawled pages. In this blog post, we’ll show you how you can harness the power of MapReduce data analysis against the Common Crawl dataset with nothing more than five minutes of your time, a bit of local configuration, and 25 cents.

Read the Full Story.

Speak Your Mind

*