The data lake (see Part 1 of this two part series) is a facilitator of another new movement in the big data industry – the Industrial Internet, a term coined by General Electric that refers to the integration of complex physical machinery with networked sensors and software. “The power of the data lake for the industrial internet harnesses the vast quantities of data from industrial equipment sensors in one location and acts on that data as a whole,” says Steve Wooledge, VP Product Marketing at MapR Technologies, a leading Hadoop distribution. “This is where Hadoop enters the picture. Hadoop is an ideal storage and analytics platform for the many data points you collect from many disparate data sources. A Hadoop-based data lake is an increasingly popular deployment for large-scale analytics techniques including machine learning, anomaly detection, and pattern matching. Specific advances in Hadoop technology support lake deployments that allow for real-time approaches to business critical decisions. These data lake deployments allow the industrial internet user to change the efficiencies of information storage and analytics to the point where real-time business model changes are occurring.”
Accenture (PDF) estimates the Industrial Internet could add US$14.2 trillion to the global economy by 2030 and that there will be particularly significant gains for the real gross domestic product (GDP) of mature economies. Another recent report by GE and Accenture (PDF) demonstrates how the Industrial Internet can be described as a source of both operational efficiency and innovation that is the result of a confluence of technology developments:
- Exponential growth in data volumes
- Internet-of-Things (IoT) providing even more data
- Growth in analytics technology capabilities
- Significant economic impact derived from an unparalleled ability to monitor
According to the report, “New technologies such as data lakes, combined with Industrial Internet capabilities, enable operators to funnel sensor data from various networked machines onto a single platform. From there, massively parallel processing capabilities analyze the data as a unified whole rather than as a billion separate bits of information, each with its own individual file path.”
Building data lake architectures in support of the Industry Internet, consists of processes significantly different from how IT architectures were built in the past. Fortunately, many big data vendors are actively engaged in supplying the necessary products and tools. “Traditional IT data management required cleansing the data, formatting it to fit existing data models and then storing it for historical analysis,” said Sai Devulapalli, Director of Product Marketing at Pivotal. “The technology and business requirements have changed significantly. On the technology side, the advancements are in collocated compute+storage clusters and price/capacity of memory. On the business side, the requirements are around handling non-traditional data sources such as Mobile, IoT, clickstream, social data etc. don’t necessarily fit into pre-existing data models. They need to perform predictive analytics to take proactive data-driven business decisions and real-time actions at scale.”
“The Industrial Internet, as with many new technologies, is not interesting simply because it is possible – it becomes interesting to us when we’re able to do something with it,” says observes Peter Schlampp, VP Products at Platfora. “The applications of the Industrial Internet are all powered by data and it is only now that we’re able to for once store, analyze and act on data at this scale.” Data lakes and the Industrial Internet are excellent examples of the rapid pace of innovation happing in the big data space today. Stay tuned for these technologies to mature in the next couple of years.
Daniel D. Gutierrez, Managing Editor – insideBIGDATA
Sign up for the free insideBIGDATA newsletter.