Sign up for our newsletter and get the latest big data news and analysis.

Why Data Pipelines Desperately Need Orchestration

In this special guest feature, Sean Knapp, Founder and CEO of Ascend, discusses how automation can greatly reduce a data engineering team’s time spent in orchestration through the enablement of data-centric pipelines. Prior to Ascend.io, Sean was a co-founder, CTO, and Chief Product Officer at Ooyala. At Ooyala Sean played key roles in raising $120M, scaling the company to 500 employees, Ooyala’s $400m+ acquisition, as well as Ooyala’s subsequent acquisitions of Videoplaza and Nativ. He oversaw all Product, Engineering and Solutions, as well as defined Ooyala’s product vision for their award-winning analytics and video platform solutions. Before founding Ooyala, Sean worked at Google where he was the technical lead for Google’s legendary Web Search Interface team, helping that team increase Google revenues by over $1B. Sean also developed and launched iGoogle, the company’s popular, customizable home page. Sean has both B.S. and M.S. degrees in Computer Science from Stanford University.

Despite technological innovations in business intelligence (BI), data processing, data warehousing, and other applications, there is still a huge gap in connecting and orchestrating the movement of data between these systems in data pipeline development. The current state of the art in orchestration continues to place the burden onto data engineers to manually manage and maintain pipelines at every stage. However, transitioning from using an imperative model to a declarative model brings intelligent orchestration to pipeline development – eliminating implementation and maintenance burdens across the entire data lifecycle.

The most powerful systems of the data management landscape are being controlled and orchestrated by some of the most rudimentary technologies. How data moves and is transformed is still dictated by manual, hard-coded triggers and rules, resulting in slow development cycles and brittle pipelines. By using an imperative model in pipeline development, data engineers are relied on as task compilers with a disproportionate amount of time spent combing through code and logs and constantly tuning parameters just to keep things running. Even the most well-thought-out pipeline implementation is likely to fail as data changes, dependencies grow, and the interconnectedness among systems becomes increasingly complex.

Moving away from an imperative, task-based design to declarative, data-centric automation is central to the creation of high-performance, automated pipelines that accelerate software delivery from code to production. Declarative-based systems lower design and maintenance costs while also improving the quality and reliability of resulting data pipelines. With less code needed, this decreases brittleness and the number of potential errors, and makes pipelines more manageable to maintain over time. This approach also supplies businesses with more pipelines that allow for more data to flow through the system – providing a faster time to value.

Furthermore, combining declarative configurations and automation to power intelligent orchestration allows data engineering teams to focus more on architecting instead of just plumbing. Without intelligent orchestration, data engineers are stuck repairing failures, updating the system by adding or deleting fields, or adjusting the schema to the changing needs of the business. This approach removes the need for those expensive, time-consuming and error-prone manual tasks – enabling data engineers to be more productive in the development, deployment and ongoing operation of data pipelines.

There’s been great innovation in data-driven applications, such as BI, data processing and data warehousing, but orchestration has been largely overlooked, particularly in data pipelines. These pipelines have emerged as the backbone of modern data systems, but the current standard of development and manual orchestration falls short of what businesses need. By moving away from an imperative model of orchestration to a more data-centric approach using a declarative system, data engineering teams can now build sophisticated data pipelines with the speed, ease, and flexibility required by the business.

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: