Cask Hydrator – Self-Service Data Lakes on Hadoop

Print Friendly, PDF & Email

In this special technology white paper, Cask Hydrator – Self-Service Data Lakes on Hadoop, you’ll learn about an exciting new tool for data science projects – Cask Hydrator, a self-service, reconfigurable, extendable open source framework to visually develop, run, automate, and operate data pipelines. With enterprises seeing an increasing need to ingest high volumes of data from a wide variety of both structured and unstructured sources in high velocity environments, a big challenge is getting the data from the source into an application. For many big data deployments, enterprises must engage costly Hadoop engineers to build data pipelines that have a high degree of complexity.



An ideal solution to address these data pipeline needs would sit on top of Hadoop and would allow users to quickly drag-and-drop the appropriate components to build and test data pipelines. Such a solution would translate complex, loosely connected technologies, programming and scripts into easy to build and maintain data pipelines on Hadoop. The Cask Hydrator solution is an extension to the open source Cask Data Application Platform
(CDAP). The solution simplifies the process of developing, running, automating, and operating data pipelines on Hadoop. As a result, Cask Hydrator allows users to rapidly build and run streamlined data refineries to support many use-cases.

The white paper includes the following high level topics:

  • Introduction
  • Self-Service
  • Comprehensive
  • Enterprise Ready
  • Conclusion – Data Pipelines in Hadoop
  • Use Cases
  • Information Security Analytics and Reporting
  • In-Flight Brand Sentiment Analysis of the Full Twitter Firehose
  • Encrypting and Data Masking
  • Data Cleansing and Validating 3 Billion Records

The Cask Hydrator – Self-Service Data Lakes on Hadoop white paper is available for download in PDF from the insideBIGDATA White Paper Library, courtesy of Cask Data, Inc.


Speak Your Mind