In this special guest feature, Marcus Phipps, Director of Data Center Solutions Marketing at Cisco, discusses perishable data which can most easily be described as data with a shelf life – unless is it acted upon in the moment, it serves little purpose sitting in your data center. Marcus leads the Data Center solutions marketing team, focusing on converged infrastructure, business applications and big data solutions. A 20-year veteran of Cisco, Marcus has worked with some of Cisco’s largest and most strategic product lines, including the Catalyst, Nexus and Unified Computing portfolio. He holds a bachelors of science in Engineering from Cal Poly, San Luis Obispo and resides in Campbell, CA.
We’re used to checking the expiration dates on cartons of milk and bottles of aspirin. But with the impending arrival of torrents of data from devices on the Internet of Things (IoT) demanding immediate attention, organizations will also need to start checking the perishability of their data.
Perishable data is information that must be acted on as quickly as milliseconds or its business value will be reduced or even lost. Examples include:
- Video surveillance cameras monitoring high-value products on a retailer’s shelf to prevent theft.
- Data on wind direction and speeds to properly align other wind turbines to maximize energy production.
- Monitoring of traffic and parking spaces to maximize parking revenue, reduce wait time and pollution by helping drivers quickly find a parking space for their cars.
- Data on pressures, fluid flow and other performance data from oil wells to prevent costly failures and/or environmental damage.
Knowing which data is perishable, and how to handle it is essential to reacting quickly enough to changing business needs, while avoiding excessive data storage and transmission costs. Here’s a brief introduction to perishable data, and some tips on how best to handle it.
Perishable Data: Why Now
Many organizations already store data in “tiers” based on its importance and, more specifically, how often it is accessed. Financial data from the first fiscal quarter, for example, might be accessed very often in that quarter, somewhat less often the next quarter, but very seldom (if ever) after that. Over time, a tiering strategy would move that data from expensive, fast storage (such as solid state drives) to less expensive, but slower commodity disk drives, finally archiving it on still slower but even less expensive tape.
Perishable data takes this ranking process to another level by identifying data that may lose its value almost immediately after its collection, and whose real time analysis and use is urgent. Its perishable nature stems from the rise of the IoT, with 50 billion devices ranging from parking meters to thermostats to cardiac monitors (among many others) expected to join the Internet by 2020. Since many of these devices are reporting on fast-changing conditions (such as the temperature of a refrigerated shipping container or a customer’s location in a store) that data must be analyzed and acted on very quickly to deliver the greatest business value.
Unlike the backward-looking analysis often performed on, for example, sales or payment records, data from the IoT will often be used for forward-looking analytics, such as predicting device failures or generating a “next best offer” for a customer before they leave a store or a Web site. This makes the latency involved in transmitting that data for analysis unacceptable. Finally, in some cases, moving that data to the cloud or a central data center for analysis may either take too long or be prohibitively expensive.
To understand how such edge analytics reduces response time and data storage and network costs, consider a security camera monitoring a shelf of high-value items. An intelligent camera or router at the edge can filter out unneeded data (signals showing the inventory is still on the shelf) and only transmit an “error” alert that requires action when the item is no longer there.
This same approach can be applied to everything from data for home health care monitoring to continual, real-time analysis of production equipment. General Electric estimates that if such analysis could increase system efficiency by one percent, over 15 years it would save the airline industry $30 billion in jet fuel, the global health care system $63 billion through improved treatment, patient flows, and equipment use, and gas-fired power plants $66 billion in fuel.
As more organizations seek such efficiencies, studies predict the majority of data will be processed at the edge within the next three to five years.
Handling Perishable Data
Performing such edge analytics requires organizations to rethink their IT infrastructures. Today, the most common analytic model sends massive data sets to a headquarters data center to be sorted and analyzed. This can result in lots of data, but not necessarily the data the enterprise needs, where it needs it.
Supplementing today’s large central data centers with smaller, more local hubs at the edge can reduce network costs and speed insights by storing and analyzing data closer to where it was generated. Smart devices at the edge filter out data that should be analyzed locally rather than at a central data center, and automatically performs the real-time (or near real-time) analysis of that data.
Edge analytics can be performed cost-effectively by software running on local (and possibly ruggedized) routers or switches, by servers at distributed data centers or by cloud services running near where the data is generated. Other technology requirements may include integrated environments that provide virtualized computing, network and storage resources to remote or branch locations, ruggedized equipment for remote locations, and improved access to network and usage data to assure these environments can handle the edge analytics load.
However you handle your perishable data, the most important step is to realize that much of the data streaming from the IoT will have an earlier “use by” date than traditional transactional data. Start now to learn how to recognize perishable data and to use edge analytics to harvest it for insights before it goes stale.
Sign up for the free insideBIGDATA newsletter.