Sign up for our newsletter and get the latest big data news and analysis.

The Cloud Helped Expand ETL into Automated Data Infrastructure

Extraction, Transformation and Load (ETL) is the process of taking data from one system, changing it in a way that allows it to be integrated with other data in a new system. Since it’s automated you can work with scale in more agile ways. It became a well known acronym in business software during the 1990s, due to the focus on data warehouses. Many people in the industry have tied ETL too closely to on-premise data warehousing, and consider that it has no place in the Cloud world, but that is an incorrect assumption. ETL is even more important today.

The first change from the data warehousing world was the growth of “data lakes” on platforms such as Hadoop. Those were sometimes on-premises and sometimes in the cloud. Think of a data lake as an operational data store (ODS) on steroids. Companies extracted operational data and moved it straight into a large database unprocessed. IT then accesses the data lake via individual applications that often had to do transformations on the data for each use. Some people began calling this technique ELT, but the basic concept was the same as in data warehousing: taking data from disparate systems and moving it to a place where it can be accessed as information necessary for business insight and action. As ETL was closely tied to the data warehousing sector, and because it has become one component of an automated data infrastructure (ADI), the industry began referring to ADI as the technology umbrella for ETL and relating tools.

The growth of data lakes happened, in large part, because of the exponential increase of data being created in the cloud, from web sites, IoT, and the growth of cloud applications. What quickly was seen is that the cloud vendors increased the number of interfaces, called APIs, necessary to extract data from systems and move them to data warehouses, data lakes, and analytic applications. While cloud applications are more open than classic on-premises applications, those APIs are complex and vary widely. For that reason, ADI offerings have become even more critical to the ability for businesses to integrate data from the wide variety of sources.

In recent years, the growth of major cloud data warehouses on platforms such as AWS, Google Cloud, Microsoft Azure, and more, means that many vendors are platform support to their software offerings. At the same time, those platforms are wrapping services and third party applications to provide more turnkey solutions to allow business to quickly and affordably add software infrastructure.

That means the importance of data infrastructure for data warehousing remains. To provide those turnkey solutions the cloud providers have begun to integrate automated data infrastructure  into their solution offerings. One example is the recent announcement that Google has partnered with Fivetran to provide data connectors within Google Cloud to help integrate existing systems into BigQuery, Google’s data warehouse offering.

Yes, we’ve come full circle. Many people claimed that the cloud and data lakes would kill data warehouses. Yet Google and other major cloud vendors understand that data warehouses, while not the end all for analytics, are a key component of proper analytics. ETL hasn’t disappeared. It has changed and expanded, but the key purpose is still there. ADI exists to help enterprises work in the modern, hybrid environment, creating pipelines from siloed applications into modern analytics tools that provide business insight.

About the Author

Ben Bloch is CEO of Bloch Strategy. He is a Los Angeles-based serial entrepreneur and journalist, and advises high growth startups. Ben spent 14 years in corporate roles with IBM and Sungard AS focused on emerging business opportunities, software as a service, cloud computing and digital media, and another 8 in the startup world during which he acted as CMO and CRO during three exits, including co-founding grant and private equity-funded clean-tech company Econation. He completed the Business Insight Program at Harvard University and graduated undergrad from UW-Madison. 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: