DataTorrent Simplifies Data Ingestion and Extraction for Hadoop with DataTorrent dtIngest

Print Friendly, PDF & Email

DataTorrent_logoDataTorrent, a leader in real-time big data analytics and creator of DataTorrent RTS, an enterprise-grade unified platform for both stream and batch processing on Hadoop, announced the availability of an enterprise-grade ingestion application for Hadoop, DataTorrent dtIngest. DataTorrent dtIngest simplifies the collection, aggregation and movement of large amounts of data to and from Hadoop for a more efficient data processing pipeline and is available to organizations for unlimited use at no cost.

Getting data in and out of Hadoop is a challenge for most enterprises, and yet still largely neglected by current solutions. No existing tool handles all the requirements demanded for Hadoop ingestion.

Without proper ingestion and data management, Hadoop data analysis becomes much more troublesome,” said Jason Stamper, analyst, 451 Research. “DataTorrent dtIngest delivers an enterprise-grade user experience and performance.”

DataTorrent also announced that the Github repository for Project Apex is now available. Project Apex is the Apache 2.0 open source unified batch & stream processing engine that forms the core foundation of DataTorrent RTS 3. DataTorrent RTS 3 Community edition is the DataTorrent certified version of Project Apex. DataTorrent RTS3 Enterprise Edition offers additional capabilities for lights out operations, and easy development & visualization on top of community edition. Both editions are now generally available and downloadable at

Hadoop ingestion is difficult and often prevents enterprises from gaining value from Hadoop, creating inefficiencies in the analysis process and stalling data initiatives altogether,” said Phu Hoang, CEO and co-founder, DataTorrent. “With the release of DataTorrent dtIngest, we now provide a free application to overcome this challenge. DataTorrent dtIngest, built on the enterprise-grade Project Apex, delivers secure, high performance and fault tolerant data ingestion for any Hadoop-based project.”

DataTorrent dtIngest makes configuring and running Hadoop data ingestion and data extraction a point-and-click process and includes enterprise-grade features not available in the market today:

  • Apache 2.0 open-source Project Apex based – Built on Project Apex, dtIngest is a native YARN application. It is completely fault tolerant, unlike other tools such as distCP, dtIngest can ‘resume’ file ingest on failure. It is horizontally scalable and supports extremely high throughput and low latency data ingest.
  • Simple to use & manage – A point-and-click application user interface makes it easy to configure, save & launch multiple data ingestion & distribution pipelines. Centralized management provides visibility, monitoring and summary logs
  • Batch as well as stream data– dtIngest supports moving data between NFS, (S)FTP, HDFS, AWS S3n, Kafka and JMS so you can use one platform to exchange data across multiple endpoints.
  • HDFS small file ingest using ‘compaction’ – Configurable automatic compaction of small files into large files during ingest into HDFS. Helps prevent running out of HDFS namenode namespace
  • Secure and efficient data movement – dtIngest supports compression and encryption during ingestion. Certified with kerberos enabled secure Hadoop clusters.
  • Runs in any Hadoop 2.0 Cluster – Certified to run across all major Hadoop distrobutions in physical, virtual or in the cloud deployments.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind