This is the first entry in an insideBIGDATA series that explores the intelligent use of big data on an industrial scale. This series, compiled in a complete Guide, also covers the changing data landscape and realizing a scalable data lake, as well as offerings from HPE for big data analytics. The first entry is focused on the recent exponential growth of data.
The Intelligent Use of Big Data on an Industrial Scale
Several decades ago, Saudi Oil Minister Sheikh Yamani gained recognition for his insight into global development: “The Stone Age did not end for lack of stone, and the Oil Age will end long before the world runs out of oil.” Today, we live in what many call the Information Age, and we are in absolutely no danger of running out of information, particularly in data form. There is a general perception that we are overwhelmed with data, making the ability to store, process, analyze, interpret, consume, and act upon that data a primary concern. For large-scale, multi-national organizations and those in heavily regulated industries— such as finance, healthcare, or those covering multiple industry verticals — the situation becomes even more complex and challenging. Escalating data concerns are rampant in the Internet of Things (IoT) Age, during which growth of data is exceeding the capacity of traditional computing. The question then becomes, how do we consume those data sources and transform them into actionable information?
The Stone Age did not end for lack of stone, and the Oil Age will end long before the world runs out of oil.
The Exponential Growth of Data
There are many sources that predict exponential data growth toward 2020 and beyond. Yet they are all in broad agreement that the size of the digital universe will double every two years at least, a 50-fold growth from 2010 to 2020. Human- and machine-generated data is experiencing an overall 10x faster growth rate than traditional business data, and machine data is increasing even more rapidly at 50x the growth rate.
The acquisition and analysis of data and its subsequent transformation into actionable insight is a complex workflow which extends beyond data centers, to the edge, and into the cloud in a seamless hybrid environment. The utilization of edge devices, in situ-computation and analysis, centralized storage and analysis, and deep learning methodologies which accelerate data processing at scale requires a new technological approach. Historically, data processing and analytics systems had specialized features for business analytics and high-performance computing (HPC) workloads. Yet with the advent of big data and industry standard x86-based computing, we are seeing a convergence in big compute, big data, and IoT for analytics. IDC research categorizes this convergence as high-performance data analytics (HPDA).
The key factor driving the adoption of data-intensive computing is the need to rapidly analyze exploding volumes of data at the point of creation and at scale.
The HPDA market is at the center of big data analytics. The key factor driving the adoption of data-intensive computing is the need to rapidly analyze exploding volumes of data at the point of creation and at scale. An important consequence of this explosion is the need for users to adopt advanced data analytics technologies. Enterprises now have access to cheaper and more powerful computing platforms, and modern analytics software like Hadoop and Spark enable realtime analytics for a wide range of use cases, including fraud and anomaly detection, business intelligence, affinity marketing, product design and development, process automation, and personalized medicine. In addition to these software frameworks, implementing storage capacities and capabilities which enhance data flow, in-place analytics, and storage efficiency such as object storage and high-performance distributed file systems, is critical for effective scaling.The size of the digital universe will double every two years at least. Click To Tweet
According to IDC’s survey on the most important digital transformation projects, respondents cited cloud transformation/transition (66%), IoT (32%), and big data/cognitive solutions (27%) as key initiatives for big data usage and development. The cloud provides scalability, and the IoT forms the foundation for investments in big data and cognitive computing. IDC predicts that by 2020 50% of all business analytics software will incorporate prescriptive analytics built on cognitive computing technology, and the amount of high-value data will double, making 60% of information delivered to decision makers actionable.
Data Growth Challenges
The exploding volume and speed of data growth has introduced several challenges:
- System management and growing cluster complexity
- Data center power, cooling, and floor space limitations
- Storage, data movement, and management complexity
- Lack of support for heterogeneous environment and accelerators
- Significant shortage of skills to integrate and manage the big data ecosystem
Infrastructure Drives Improvement
Organizations are evaluating and implementing infrastructure to drive the following improvements:
- Manage growth and operational cost of big data infrastructure
- Provide elasticity and flexible capacity
- Ensure performance for diverse workloads
- Rapidly deploy and scale infrastructure
- Simplify management with Big Data as a Service (BDaaS)
In this document, our focus is on “industrializing” big data infrastructure—bringing operational maturity to the Hadoop data ecosystem, making it easier and cost-effective to deploy at enterprise scale, and moving companies from the proof of concept (PoC) stage into production-ready deployments.
Over the next few weeks, this series on the use of big data on an industrial scale will cover the following additional topics:
- The Changing Data Landscape
- Realizing a Scalable Data Lake
- The HPE Elastic Platform for Big Data Analytics
- HPE Workload and Density Optimized System
- The Five Blocks of the HPE WDO Solution
You can also download the complete report, “insideBIGDATA Guide to the Intelligent Use of Big Data on an Industrial Scale,” courtesy of Hewlett Packard Enterprise.