Sign up for our newsletter and get the latest big data news and analysis.

The Changing Data Landscape

This is the second entry in an insideBIGDATA series that explores the intelligent use of big data on an industrial scale.  This series, compiled in a complete Guide,  also covers the exponential growth of data and realizing a scalable data lake,  as well as offerings from HPE for big data analytics.  The second entry in the series is focused on the changing data landscape. 

The Changing Data Landscape

Large enterprise customers have made huge investments in data warehousing technology over the past decade. The cost of upgrading and adding new data warehouse licensing is cost prohibitive. Technologies like Hadoop and Spark offer organizations the option of accruing a greater ROI on existing data warehouse resources, by providing a data processing platform to offload Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) workloads from the data warehouse and to reduce the cost of storing and accessing less frequently used data.

As organizations endeavor to extract insights from a combination of human, machine, and business data, Hadoop, a cost-effective computing and storage platform, becomes a more viable approach for harnessing operational data and big data analytics.

Initially, many organizations implement Hadoop to store data from multiple sources in its original form, with the intent to ELT data for downstream analytics. A large percentage of these developments start as small-scale experiments within isolated business units rather than enterprise initiatives, either in the public cloud or as a standalone op-premises deployment to serve a single use case. Data exploration or data warehouse optimization are the most common entry points.

As organizations endeavor to extract insights from a combination of human, machine, and business data, Hadoop, a cost-effective computing and storage platform, becomes a more viable approach for harnessing operational data and big data analytics.

Enterprises with executive sponsorship and a data driven strategy quickly progress to other analytics use cases that realize greater business value—this includes customer 360, predictive analytics, and data discovery. Data lakes are a logical foundation to drive these analytics use cases, centrally manage a variety of data from multiple sources—both processed and unprocessed (dark data), and enable enterprises to glean insight from the information.

changing data landscape

Enterprises have not yet realized the true potential of data lakes for two key reasons:

  • Lack of a correct data model that unifies all data
  • Traditional Hadoop infrastructure is inefficient and rigid

While vendors promote the ease of storing and analyzing data on shared storage, without the additional step to ingest the data into Hadoop, the reality is that all data required for analytics must be stored and organized in appropriate formats in order to accelerate different analytics workloads.

changing data landscape

Conventional platforms are designed for batch workloads and do not scale efficiently for modern analytics workloads (i.e. machine learning, streaming, SQL, and NoSQL interactive analytics). The need for performance and scaling out linearly to accommodate various software frameworks, such as Spark and NoSQL which are more memory and compute-intensive, requires flexible compute and storage resources capable of consolidating diverse workloads on top of the data lake.

The inability to meet the needs of new analytics workloads is driving IT departments towards consolidation. While data lakes consolidate information, cluster and workload consolidation towards a multi-tenant, elastic platform is a more complex undertaking.

Due to advancements in big data usage and management, many organizations realize that their existing infrastructure is underutilized or over provisioned, resulting in cluster and data sprawl. The inability to meet the needs of new analytics workloads is driving IT departments towards consolidation.

While data lakes consolidate information, cluster and workload consolidation towards a multi-tenant, elastic platform is a more complex undertaking. The advent of IoT and cloud analytics requires additional capabilities that enable more efficient and elastic mechanisms to ingest, store, and process data across remote locations.

Over the next few weeks, this series on the use of big data on an industrial scale will cover the following additional topics:

  • The Exponential Growth of Data
  • Realizing a Scalable Data Lake
  • The HPE Elastic Platform for Big Data Analytics
  • HPE Workload and Density Optimized System
  • The Five Blocks of the HPE WDO Solution

You can also download the complete report, “insideBIGDATA Guide to the Intelligent Use of Big Data on an Industrial Scale,” courtesy of Hewlett Packard Enterprise.

Leave a Comment

*

Resource Links: