Data and the way that data is used have changed, but data warehousing has not. Today’s premises-based data warehouses are based on technology that is, at its core, two decades old. To meet the demands and opportunities of today, data warehouses have to fundamentally change.
It used to be that the data you needed to analyze came primarily from internal sources
(e.g. transactional, ERP, and CRM systems) in well-defined, structured forms at a reasonably predictable rate and volume. Today data comes not only from those sources, but increasingly from constantly evolving sources of machine-generated data outside your direct control such as application logs, web interactions, mobile devices, sensors, and more. That data frequently arrives in flexible semi-structured formats such as JSON or Avro and arrives in highly variable rates and volumes.
Data used to flow through complex ETL pipelines into a data warehouse, where reporting queries ran periodically to update a known set of dashboards and reports. That process often took days. Today a wide array of analysts need to explore and experiment with data as quickly as possible, even without knowing in advance where there might be value in it. Not only analysts, but also a growing number of applications need immediate access to data to make decisions.
There are technologies available today that were not even on the radar when conventional data warehouses were designed. For example, cloud applications and cloud infrastructure have emerged to play
a critical role in the IT strategy of all types and sizes of organizations.
Although data warehousing technology has ably served organizations over many years, the accumulated architectural baggage of several decades has taken its toll. Data warehouses have increasingly failed to adapt to these changes: they struggle to handle rapidly arriving and constantly evolving data, to provide the flexibility to scale rapidly to meet ever-changing demands, and to do that cost-effectively.
At the same time, newer “big data” offerings such as those based on Hadoop are not the answer either.
They are useful tools for advanced data science, but were simply not designed for data warehousing: they require difficult-to-find new skills, are not fully compatible with the existing ecosystem of SQL-based tools, and fail
to deliver interactive performance.