StreamSets Delivers Ultralight Open Source Ingestion for Edge Devices

Print Friendly, PDF & Email

StreamSets Inc., provider of the enterprise data operations platform, debuted StreamSets Data Collector Edge (SDC Edge), enabling the industry’s first end-to-end data ingestion solution for resource- and connectivity-constrained systems such as Internet of Things (IoT) devices and the endpoint systems and network infrastructure used in cybersecurity applications.

Available as open source Apache-licensed software, SDC Edge packs the core functionality of the widely adopted StreamSets Data Collector into a footprint of less than 5MB, an order of magnitude smaller than alternatives. This makes it ideal for IoT use cases, where today ingestion logic is often hand-coded and tightly coupled to the specific device. As a result, dataflows are difficult to maintain as devices are upgraded, they are poorly instrumented for operational data-flow management and they require a gateway that adds cost and complexity. The benefits of a small footprint also apply to cybersecurity initiatives, where the low CPU consumption and limited attack surface allow deployment of SDC Edge across large populations of mobile endpoints and networking systems.

Key characteristics of SDC Edge include:

  • Ultralight — Requires less than 5MB and does not need additional software (e.g. Java) to operate.
  • Platform-independent — Based on Go, SDC Edge runs on a broad range of operating systems, including Linux, OS X, Windows and Android.
  • Drag-and-drop data-flow design — Identical to StreamSets Data Collector, pipelines are built using origin, destination and transformation objects, with the option to plug in scripts and trigger custom code execution.
  • Edge analytics — SDC Edge performs computations such as data normalization, redaction and aggregation, and is architected to support full-featured edge analytics, including machine and deep-learning models, in the future.
  • Multiple bidirectional pipelines — SDC Edge can run multiple pipelines on the same edge device, and pipelines can both send or receive data.
  • No IoT gateway cost — Data can now be ingested directly to storage/compute systems without the added cost, complexity and latency of a separate IoT gateway system.
  • Performance management — Using StreamSets Dataflow Performance Manager, SDC Edge can be deployed at scale, and metadata drives Live Data Map visualization and enforcement of source-to-consumption data SLAs.

IoT and cybersecurity are both red-hot spaces for big data innovation. Applying machine learning and other analytic techniques to data aggregated from IoT sensors and devices can help in areas as diverse as factory equipment, construction, oil and gas, and medical devices. Cybersecurity applications benefit from applying advanced analytics to the vast quantities of data collected across a corporate network in order to detect imminent threats or attacks in progress.

The massive volume of data created by the explosion of digital devices presents an invaluable opportunity for analytics and insight, yet harnessing it has been a challenge as IoT and cybersecurity efforts have suffered from the lack of end-to-end data ingestion frameworks,” said Arvind Prabhakar, co-founder and CTO, StreamSets. “We built SDC Edge to bring disciplined, well-managed data movement to huge populations of IoT sensors and personal devices so that the promised benefits of these critical initiatives are realized.”


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind