Numenta Anomaly Benchmark Evaluates Anomaly Detection Techniques for Real-time, Streaming Data

Print Friendly, PDF & Email

numenta_logoNumenta, Inc., a leader in machine intelligence, launched the Numenta Anomaly Benchmark (NAB), an open-source benchmark and tool designed to help data researchers evaluate the effectiveness of algorithms for anomaly detection in streaming, real-time applications.

Anomalies in streaming data are patterns that do not conform to past patterns of behavior for a given data stream. Until now, no benchmark has existed to evaluate anomaly detection in real-time streaming data.

NAB will be publicly unveiled on November 13 during MLconf in San Francisco in a presentation by Numenta Research VP Subutai Ahmad, “Real-time Anomaly Detection for Real-time Data Needs.” A peer-reviewed paper on NAB also was accepted by the IEEE Conference on Machine Learning and Applications and will be presented during the conference on December 9-11 in Miami.

The Need for Anomaly Detection in Time-Series Data

Explosive growth in streaming data is happening across industries, largely driven by the rise of the Internet of Things (IoT) and the proliferation of connected real-time data sources and applications with sensors producing waves of data. Voluminous amounts of this data are being stored for later analysis, though it often isn’t necessary or practical to capture and store all the information. Instead, data analysts need a way to analyze time-series data in real time, identify when something is different and act upon that insight.

Different approaches are being pursued to solve this problem, in the form of anomaly detection algorithms. But until now, a measurement to gauge the effectiveness of real-time anomaly detection algorithms has been lacking. With this goal in mind, Numenta created NAB.

There is an explosion in real-time streaming data sources.  Data owners want to be able to model this data and figure out if anything has changed,” commented Numenta CEO Donna Dubinsky.  “We created this open benchmark as a tool to help data scientists evaluate the effectiveness of different algorithms in finding anomalous behavior in these data streams. Having a standard benchmark could spur innovation in real-time anomaly detection algorithms. Our hope is the open source community will add new data sets, propose different scoring mechanisms, and test and compare other algorithms with our HTM algorithms.”

Early anomaly detection in streaming data has practical and significant applications across many industries – from monitoring critical IT infrastructure to detecting potential fraudulent financial transactions, from understanding energy consumption to geo-tracking of vehicles in logistics networks.

The Numenta Anomaly Benchmark

NAB is an open source framework that was created to help data professionals test, score and evaluate anomaly detection algorithms on time-series data and to compare their internal anomaly detection techniques to published algorithms.

NAB also allows people to test their algorithms against Numenta’s HTM detector, which is based on Numenta’s Hierarchical Temporal Memory technology. It uses a biologically inspired memory prediction algorithm to model real-time data streams and continuously learns.

The major components to the NAB framework include:

  • Real-world data. Includes 58 labeled streaming data files that are a combination of real-world data sets along with some simulated datasets. All anomalies are marked.
  • Anomaly windows. These are defined ranges of data points that surround a known anomaly label. NAB uses these windows to decide whether, and how early, an algorithm detected each anomaly.
  • A scoring mechanism. Scoring is specifically designed for streaming data and rewards early detection.

NAB’s emphasis on anomaly windows and early detection is pioneering.  In addition the research community stands to benefit greatly from an open dataset containing real world data, and an open source tool for measuring the effectiveness of real-time anomaly detection algorithms,” said Varun Chandola, Assistant Professor in Computer Science and Engineering, SUNY Buffalo.

NAB Peer-Reviewed Research Paper 

http://arxiv.org/abs/1510.03336

 

Download insideBIGDATA: An Insider’s Guide to Apache Spark

 

 

Speak Your Mind

*