Sign up for our newsletter and get the latest big data news and analysis.

Hot and Cold Data from the Internet of Things

Dan_Graham-Bio-PicIn this special guest feature, Dan Graham of Teradata, explores why IoT data, despite having useful applications, is eroding in value faster than transaction data. The trends – lacking monetary data fields, most sensor data will become low value in hours, days or weeks as they are replaced by fresh collections of the same sensor data. Architectures and systems will be forced to compensate for this rapid decline to cope with retention and processing costs. With over 30 years in IT, Dan Graham has been a DBA, Strategy Director in IBM’s Global BI Solutions division, and General Manager of Teradata’s high end servers.  He is currently Technical Marketing Director for the Internet of Things at Teradata.

Twenty terrorbytes a day.  Ten gag-a-bytes per minute.  Run.  Hide.  We can’t afford this!  It’s too complicated!  That’s the fear many Internet of Things (IoT) application designers struggle with today. We hear this story frequently, but should we listen?

Signal to Noise Ratio

A tremendous amount of sensor data is useless to start with. The sensor says it’s 780 beep. It’s 780 again beep. The sensor says that every second for three days and then, and then – wait for it – its 78.10 beep. Most sensor data is repetitive and useless.  What we typically want to find are the anomalies, the outliers, the unexpected stuff.  Repetitive sensor readings are mostly noise.  That noise becomes terabytes of storage capacity that worry the Chief Financial Officer.

Sensor data signal to noise ratio is often bigger than 1-to-100 – one useful measure in 100. Compression algorithms help immensely.  Some organizations use time series compressors (temporal) to squeeze out repetitive data.  In other cases, programmers build custom compression algorithms.

Serving up Hot and Cold Data

All data has a half-life.  Initially the data is constantly in use.  Then, days or weeks later it’s not so interesting.  And finally it decays to zero value.  To find the inflection points of IoT data requires data temperature tracking.  Data temperature is defined by popularity – if people use specific data, it warms up.  If no one uses it, data cools off.  When data freezes solid, it’s time to delete it.  Unfortunately, date-time stamps don’t help, but tools are coming to market that let administrators determine which data is popular and which isn’t. Without these tools, we are forced to keep all data forever (or until the budget people scream).  Given that sensor data is often 10 to 100 times bigger than Big Data, the screaming started weeks ago.

But IoT data is different in that it tracks the behavior of physical things.  There is no money field in the sensor data.  The value of the data is less to begin with and it gets cold faster.  Thus it gets cold in hours or days.  And within hours, a new data set arrives from the same sensor.  It’s still the 780 beep.

The Analytics of Things

We can think of sensor data like raw iron ore: a scoop of dirt must be smelted down to a few nuggets of pure iron. Tons of dirt (noisy repetitive measurement values) yields an iron ingot.  This is the job most companies give to the digital data lake.  Sensors send tons of data to the lake where it gets refined, and once smelted into aggregates and anomaly events, the dirt can be discarded.  The ingots go to the data R&D department (data scientists) or to the data product manufacturing (a data warehouse).

Analyzing IoT data in isolation has good value – after all, it’s hot data.  Analyzing the same data in the context of the corporation has enormous value.  Aggregated sensor data can be coupled with supplier data, sales orders, yield scores, and maintenance schedules.  Failure predictions and risk scores can be generated.  Combining sensor data with other datasets is where wondrous use cases emerge like lowering warranty reserves, repairing engines before they fail, predicting train derailments weeks in advance, or finding stolen smart meters with correlation patterns.  This is the Analytics of Things.  According to IDC, “IoT with analytics results in virtuous circle of data generation and value creation.1” Gartner says that “analyzing IoT data… is arguably the main point of IoT.2” I like to say it’s where the big return on investment pops out.

Okay, so not every sensor dataset will be so easily tamed.  Some use cases and some sensors will only yield their secrets with brute force and massive storage capacity.  Yet, this should be the exception in our plans, after trying every other avenue of escape first.  Think big.  Then think smart.

1 IDC, Nascent IoT Market Shakes Up Vendor Strategies, Jan 25, 2014

2 Gartner, Three Best Practices for Internet of Things Analytics, October 23, 2015

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: