How to Use Hadoop as a Piece of the Big Data Puzzle

White Papers > Hadoop > How to Use Hadoop as a Piece of the Big Data Puzzle

What Hadoop Can Do for Big Data - Organizations are embracing Hadoop for several notable merits:

• It's  distributed. Bringing a high-tech twist to the adage, “Many hands make light work,” data is stored on local disks of a distributed cluster of servers.
• Hadoop runs on commodity hardware. Based on the average cost per terabyte of compute capacity of a prepackaged system, Hadoop is easily 10 times cheaper for comparable computing capacity compared to higher-cost specialized hardware.
• It is fault-tolerant. Hardware failure is expected and is mitigated by data replication and speculative processing. If capacity is available, Hadoop runs multiple copies of the same task, accepting the results from the task that finishes first.
• It does not require a predefined data schema. A key benefit is the ability to just upload any unstructured files without having to “schematize” them first. You can dump any type of data into the system and allow the consuming programs to determine and apply structure when necessary.
• It scales to handle big data. Hadoop clusters can scale to between 6,000 and 10,000 nodes and handle more than 100,000 concurrent tasks and 10,000 concurrent jobs.
• It is fast. In a performance test, a 1,400-node cluster sorted a terabyte of data in 62 seconds; a 3,400-node cluster sorted 100 terabytes in 173 minutes.
To put it in context, one terabyte contains 2,000 hours of CD-quality music; 10 terabytes could store the entire US Library of Congress print collection. You get the idea. Hadoop handles big data. It does it fast. It redefines the possible when it comes to analyzing large volumes of data, particularly semi-structured and unstructured data (text).

Contact Info

Work Email*
First Name*
Last Name*
Zip/Postal Code*

Company Info

Company Size*
Job Role*

All information that you supply is protected by our privacy policy. By submitting your information you agree to our Terms of Use.
* All fields required.