The Data Engineering Cloud: Three Lessons for a New Era

Over the past two decades, we’ve learned a lot as we’ve evolved from big data (new challenges) to data science (new projects) to data engineering (new practices). We’ve arrived at the next evolutionary era, and there are new lessons to be learned.

This new era is open and inclusive. It requires a “big tent” approach that connects data experts and domain experts, hand-coders and no-coders, engineering discipline and business agility. It demands data engineering practices and technology platforms that bring together a diversity of stakeholders and harness the power of data in the cloud to transform businesses.

This is the era of the Data Engineering Cloud, and these are three of its most important lessons.

Lesson No. 1: Simplicity Empowers Everyone

The power of the Data Engineering Cloud is in its simplicity. Sometimes people equate simplicity with a lack of depth. Not here. In the Data Engineering Cloud, simplicity empowers everyone.

Emerging cloud platforms for data engineering can change the game for data experts as well as subject matter experts. For new, relatively nontechnical users, the cloud offers an easy onramp to work with data—a democratization of end-user data engineering tasks. New users can go from a business idea to delivery with minimal friction. For the hardcore “speeds and feeds” folks, the cloud invites new and lightweight ways to think about massive projects. For example job latencies can be radically shortened by temporarily bursting utilization to large numbers of resources at the same time: in that sense, “fast is free” in the cloud.

As you think about a new stack in the cloud, make sure your solutions keep it simple. Don’t glory in premature optimization or unnecessary complexity.

Lesson No. 2: Governance and Agility Aren’t Mutually Exclusive

Historically, real tension used to exist between the people who govern data and the people who use data. On one side of the architectural divide stood the limited-use data warehouse. On the other were the tools used by data analysts and data scientists. To avoid the enterprise data warehouse, dozens of little departmental datasets were squirreled away, hidden in servers under people’s desks, out of sync and ungoverned.

Clouds don’t have walls. There’s no “IT side” or “business side” in the Data Engineering Cloud; there’s open space for everybody—just log in and get to work on data, both independently and via collaboration. Governance and agility are matters of policy and configuration, not accidents of architecture.

A Data Engineering Cloud serves as a single, attractive gathering place for data. It’s where data engineering, as a practice, can be pursued as a team sport across the organization. The data and ops experts who govern the data can keep an eye on everything in a single environment. Meanwhile, the domain experts who use the data for business purposes can get cracking on their own projects and take care of themselves—including sourcing and transforming data.

The Data Engineering Cloud is where all data stakeholders gather to work together with full transparency. You shouldn’t sacrifice agility for governance, or vice versa. You can have both if they work together.

Lesson No. 3: Embrace Continuous Evolution

As we know, software-as-a-service means everybody is always using the latest version of software. As software gets better, the whole organization moves forward.

In cloud-centered companies, the flywheel of innovation spins fast and furious because the whole organization is constantly upgrading and upskilling their data practices. There aren’t certain constituencies left behind on legacy technologies. Bottlenecks don’t form around the keymasters to old, arcane systems. And as new low-code technologies emerge that remove complexity, more players flood the data engineering field, taking on tasks like data transformation themselves without an assist from specialized technical resources.

The Data Engineering Cloud insulates us from stagnant complexity, regardless of whether that complexity comes from a third-party vendor or is hand-coded. It invites us to embrace continuous change and testing. Everybody is always upgraded, and everybody is always upskilling. This is good news for your employees and their resumes, and it’s good news for your business.

About the Authors

Joe Hellerstein, Co-Founder & CSO, Trifacta | Professor at University of California, Berkeley. Joe is Trifacta’s Chief Strategy Officer, Co-founder and Jim Gray Chair of Computer Science at UC Berkeley. His career in research and industry has focused on data-centric systems and the way they drive computing. Fortune Magazine included him in their list of 50 smartest people in technology , and MIT’s Technology Review magazine included his work on their TR10 list of the 10 technologies “most likely to change our world”.

Jeffrey Heer, Co-Founder & CXO, Trifacta | Computer Science Professor at University of Washington. Jeff is Trifacta’s Chief Experience Officer, Co-founder and a Professor of Computer Science at the University of Washington, where he directs the Interactive Data Lab. Jeff’s passion is the design of novel user interfaces for exploring, managing and communicating data. The data visualization tools developed by his lab (D3.js, Protovis, Prefuse) are used by thousands of data enthusiasts around the world. In 2009, Jeff was named to MIT Technology Review’s list of “Top Innovators under 35”.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Speak Your Mind Cancel reply

Featured RSS Feed

More News from insideHPC