This is the fifth article in a series focusing on a technology that is rising in importance to enterprise use of big data – IoT Analytics, or the analytical component of the Internet-of-Things. In this segment, we’ll talk about the challenges of deploying IoT analytics. Previous parts to this special feature:
Challenges of Deploying IoT Analytics
As important as IoT analytics has become to realizing the potential of connected devices and sensors for the corporate world, the new technology isn’t without substantial challenges. This article of the series provides a high-level perspective for the roadblocks that many companies may experience when working toward the deployment of analytics in support of IoT.
IoT presents a potential big data preparation and integration problem. The addition of instrumentation to assets that were not previously monitored on a regular time scale, contributes to the feeling that everything is becoming more connected, more digitized – equipment, customers with decision makers looking for a competitive edge. Further, integrating data from real-time sensors with contextual data-at-rest is a big data integration challenge. With respect to data integration, a lot of the sensor data is voluminous and it’s landing in data lakes and Hadoop type structures. At the same time, the contextual data is often found in a data warehouse so those different sources have to be integrated, and prepared and cleaned to get the data ready for the analytics.
Being able to do the analytics is one thing, but pulling the sensor data off the instrument and providing context with data-at-rest systems, figuring out what the insights are, and then understanding how to take the insights to action – there are some challenges there that are really quite subtle.
The technical challenge with IoT analytics is that you have a large volume of data flowing in and you have to perform analytics on that, often in real-time. This process frequently requires enrichment with contextual data before you can do analytics with meaning.
Analytics at the Edge
As mentioned earlier, an important goal is to push analytics out to the edge (or at least “close to the edge”). Analytics at the edge can reduce the need to store and process large volumes of data at a central location. Intelligent analytics at the edge, near where the data is generated, also reduces networking communications overhead. Rather than sending every bit of data to a centralized location where it can be analyzed, edge analytics places another layer of intelligence and automation where the data resides.
It is still very early in this effort and there are some important caveats. For example, there is an inherent conflict here technologically. If you push the analytics out to the edge of IoT systems, you’ll find situations where you also need the contextual data. Now you have a round trip, or at least a trip to a central data store. You either have to push that reference data to the edge, or you have to access it from a central store. So the idea of edge computing in IoT is a critical concept, but now you have technical challenges of coordination in terms of providing the context where that context is needed in the analytics. You really have to think about the technology and architecture behind what it takes to deliver an advanced IoT application, more than just gathering data.
As an example, an edge device whose job it is to collect temperature might not have the capacity to store temperature data across hours, days, weeks, months, or years. Likewise, it probably doesn’t have contextual data such as humidity, barometric pressure, wind speed, direction, etc. to know whether a current temperature reading is good, bad, or otherwise. In order for IoT deployments to be beneficial, the data needs to have context which usually comes from lengthy time periods and ordinarily from multiple devices. Knowing what the temperature is at a particular point in time might not be useful without further context.
It is also important to determine what time scale on which you’re operating – daily, hourly, or on a minute, second, or sub-second basis. If you’re seeking to optimize the maintenance on a piece of equipment and the elapsed time to get a repair person out is two hours, then you don’t need to do analytics at the edge versus a central data store.
At this time, most edge sensors and devices do not have the resources like processing power, memory, and storage for performing their own analytics. Today, the job of the edge device is to gather data, send it upstream where the analytics and processing takes place, listen for a response from the upstream systems, and to take appropriate actions. The industry is working to take the current crop of edge devices that are typically designed to be low power and inexpensive, and make them more capable without increasing their cost and complexity.
There are architectural challenges where deploying an IoT analytics solution isn’t a decision you make and then magically have all the technology in place. There are many steep challenges for organizations. As discussed above, there’s the edge computing along with the latency and having platforms that are high-performing like Spark streaming which is great if you don’t need consistent, ultra-low latency. Such an architecture may break down if you need low-latency at scale and where a micro batch solution may be more appropriate instead of true streaming. There many technologies to sort through and there could be a substantial lead time to getting that technology and infrastructure in place, not to mention your development organization also has to acquire a new technology mindset. Think of how the typical application development team thinks, or even design teams for that matter – it is request and response, but IoT is not request and response. IoT is streaming. IoT is coming at you fast – you don’t know when it’s coming at you, you don’t know how fast it’s coming at you. You have to think in IoT terms.
In the end, there is a challenge to have an execution platform for IoT analytics – having a platform that has powerful analytics capabilities to interpret what’s happening at “the moment of truth” – the opportunity for intervention, the sensor reading that is anomalous, e.g. the customer has entered the store. Those moments of truth are when you want to have the enterprise platform be there to kick in to provide the analytics capabilities to interpret what’s happening at that moment of truth and understand the implications with contextual information. And then that platform has to have the muscle or automation to initiate the corrective action based on the patterns that have been recognized, and also the flexibility to assemble and compose new lightweight applications on the fly, and the platform sometimes needs to scale particularly well depending on when the data is coming; like on Cyber Monday.
An important question to ask as you walk down the IoT analytics path is whether you can build a platform that is enterprise ready that can scale elastically across on-premise and cloud environments. Some of this data is cloud-based, and some is coming through on-premise systems. The platform also needs to have built-in collaboration where different players can collaborate on the development of the rules and models across marketing and IT, or across engineering and product. All this digitization of the business is happening in different time scales but a lot of it is close to real time and a lot of it is bringing in contextual data to surround the event that’s occurred at this moment of truth.
All of these things: execution platform, type of analytics, the time scales, the context of the data – come together in not just a few analytics and a bit of data in a data lake – it’s the whole platform of execution of those analytics in action that’s important.
There are also security concerns with IoT. The Internet-of-Things essentially expands the “attack surface” where enterprises could be more vulnerable than ever. Anything that is connected to the Internet can be an attack surface. Poorly secured IoT devices and services can serve as potential entry points for cyberattack and expose user data to theft by leaving data streams inadequately protected. It’s just a matter of time before you discover the wearable device on your wrist or the thermostat connected to your WiFi can be used as the starting point to penetrate corporate and government networks. IoT security will become a significant component of security budgets and security concerns could potentially add pause to IoT adoption.
With the impending arrival of torrents of data from IoT devices demanding immediate attention, organizations will need to address the challenge of checking the “perishability” of their data. Perishable data is information that must be acted on as quickly as milliseconds or its business value will be reduced or even lost. Knowing which data is perishable, and how to handle it is essential to reacting quickly enough to changing business needs, while avoiding excessive data storage and transmission costs.
Contributed by Daniel D. Gutierrez, Managing Editor of insideBIGDATA. In addition to being a tech journalist, Daniel also is a practicing data scientist, author, educator and sits on a number of advisory boards for various start-up companies.
Sign up for the free insideBIGDATA newsletter.