Sign up for our newsletter and get the latest big data news and analysis.

AKUDA Labs Announces Availability of Bananas and Demonstrates Improvements in Data Stream Processing

AKUDALabs_logoAKUDA Labs announced the commercial availability of Bananas™, its real-time data stream processing system, and the results of a suite of benchmarking tests in which Bananas dramatically outperformed Spark Streaming, a widely deployed open-source system. Bananas provides high-performance, real-time pattern matching and event detection within streaming structured and unstructured text and image data, such as those found in spam, log file and social media trends detection systems. Bananas presented latencies in the tests that were effectively unchanged across increasingly greater throughputs; whereas Spark Streaming was unable to maintain steady-state performances with bounded processing latencies at a fraction of the throughput that Bananas handles with sub-millisecond latencies.

These groundbreaking capabilities from AKUDA Labs, also known for its real-time streaming classification engine, Pulsar™, will enable critical big data systems to meet sub-second latency requirements required within the context of extremely high throughput. Such exceptional performance is relevant to industries that depend upon real-time solutions, including those engaged in online marketing, the Internet of Things (IoT), and spam and fraud detection. The implication of this massive performance gap over Spark is that AKUDA Labs can deliver the following:

  • 24,000 times lower latency
  • 100 times less hardware cost
  • 10 to 400 times more throughput at fractional latencies
  • 100 times less energy consumption and rack space
  • As much as a 1,000 times reduction total cost of ownership (TCO)
  • 100 times less network bandwidth consumption

AKUDA Labs designed the benchmark tests to emulate the detection of specific patterns within streaming unstructured text data. It chose to test the Apache Software Foundation-based Spark Streaming because its large user base makes it a common point of reference. The primary task for the systems was to search for specific patterns within a large text repository, in this case a cleaned data set of the complete works of Shakespeare. Using four 16-core virtual machines with 16GB of RAM and 10Gbps connections, the test measured latency over comparable setups of the two distinctly different systems.

The commercial availability of Bananas comes at a time when the need for extremely high-performing, real-time stream processing systems is becoming urgent,” said Vince Schiavone, AKUDA Labs Co-Founder and CEO. “Systems that process data with variable latencies, or have increasing latency as more information needs to be processed, are simply not appropriate in an increasing number of scenarios across multiple industries.


The widely divergent results from these two systems stem from fundamentally different architectures. Bananas processes each packet upon arrival; whereas Spark Streaming creates micro-batches of data received over a specified time window and processes the batch altogether. Designed specifically to use networks of shared memory multiprocessors and take advantage of the internal multicore infrastructure, Bananas is implemented through a lockless shared memory queue management protocol, which enables massively parallel processing pipelines. For Spark Streaming, the division of batch resilient distributed data sets (RDDs) into block-time windows leads to parallel processing of each partition, a setup requiring state management, extensive synchronization, data distribution and aggregation processes that add latency.

These test results underscore the superiority of distributed system infrastructures that target shared-memory multiprocessors and exploit all their capabilities, as we’ve done with Bananas,” said Luis Stevens, AKUDA Labs Co-Founder and CTO. “Spark Streaming is essentially an abstraction over the Spark batch processing system and is unsuitable for practical streaming systems that require high throughput while performing computationally intensive tasks at sub-second latencies.”


Sign up for the free insideBIGDATA newsletter.


Leave a Comment


Resource Links: