End User Data Preparation is More Than Data Wrangling

Print Friendly, PDF & Email

In this special guest feature, Pete Aven, Director of Sales Engineering for FactGem, offers a discussion that strives to open the eyes of enterprise decision makers to the opportunities for end user data prep in a context beyond simplified data wrangling. Pete has over 17 years of experience in software engineering and system architecture with an emphasis on delivering large-scale, data-driven applications. He is author of the O’Reilly book, Building on Multi-Model Databases. Learn how to get the answers to hard questions and create a business data model with the FactGem Data Fabric.

Any large company is composed of a collection of organizations with a set of common goals. These include internal and external groups, which all work together to help manage the data, applications and supporting technologies for the business. These companies have grown and evolved over time and continue to grow and develop with the advances of technology, increases in volumes of data and business growth. Therefore, their data and technological landscapes are quite complex, as are their business requirements for managing their data for themselves and their customers.

Any such enterprise that is successful is going to have an architectural capability. They will have a group of people who help manage the job of integrating the people, processes and technologies required to meet the needs of the business. This group will often be employing TOGAF, DoDAF, the Zachman Framework or some other enterprise architecture framework that helps define business goals and aligns them to the architecture objectives and realities that are appropriate for enterprise software development.

These frameworks all have many elements in common – they all provide a methodology for gathering requirements, defining baseline and target architectures for meeting requirements, and levels of abstraction to be able to communicate and implement the architectures, requirements and timelines for multiple audiences. Disparate groups within a business each have a distinct perspective and a different set of communication skills depending on their function within the business that the frameworks help bridge the gaps between so that everyone can work together successfully to reach a common goal.

Yet, despite all the rigor and analysis that is invested into every IT project, it is not uncommon to find a business or mission three years into a two-year project and nowhere near the target goal they set out for themselves at the start.

There are many reasons for this, but a major one is the disconnect in communication that exists between the business and IT. Business users think of their data in a higher level of abstraction. They think in terms of the key entities and relationships that define their business as a whole and the rules that the business applies to these comprehensive entities and relationships to run the business.

However, the data for business entities and relationships get mapped to multiple, complex, underlying storage systems. The technical users who manage this data, have a different understanding of the requirements of the data than the business users do. IT is bound by the limitations of their functional area’s data resources (a silo) and the constraints of the systems that manage their data. The terms entity and relationship are overloaded and can have different meanings to business and IT.

Now, a significant portion of business requirements are centered on unification of data. The process has taken on a name of its own, “data engineering,” and skilled data engineers have emerged as heroes to organizations with multiple complex systems containing complex data having complex requirements for data integration, governance, security and reporting. Data engineering takes significant time and effort though and business wants their unified data faster than data engineers can provide it.

To democratize data and enable business users to work with their own data in a self-service model, data preparation tools have been created. The end user data prep market has grown up around us. These applications allow non-technical users to “wrangle” their data for themselves, without having to write code. Through drag and drop visual interfaces, users can create workflows and pipelines to take data from one set of systems and centralize it in a uniform format in another single system. From there, users can plug in BI tools to deliver the answers they require. This can be very effective within a silo – but these applications can be much more than just a response to a business requirement. They can also help drive the requirements.

One of the first key deliverables for any architectural framework is understanding the business and mission requirements. For any target architecture to be successfully met, requirements must come from the business and mission leaders. These requirements provide the foundation of the requirements for data, applications, and their supporting technologies.

Many data prep tools are coupled to the underlying storage systems. This is a bottom up approach to data integration. This allows non-technical users to essentially play a game of drag and drop Tetris to generate a report, and can be very useful and can aid in delivering information for a department in the short term. But it doesn’t necessarily address the comprehensive long-term goals of the business. It also requires knowledge of the underlying systems, which may be limited by a user’s functional area and the data they have access to.

The right data prep tool allows non-technical users to model their data, decoupled from the underlying sources. This top-down approach to data integration allows business leaders and IT leadership, together, to draw out the key entities and relationships for an organization as they are understood across the enterprise, not just within any silo. Decoupled from the constraints of the underlying systems, business and IT can better communicate their needs and constraints to one another and come to a better understanding of what the baseline reality is and what the target should be and what it will take to get there.

With a data prep tool that decouples the data model from the underlying storage and then allows non-technical users to map data sources to the conceptual data model that is created to then make it physical, a significant gap in communication is fixed in the requirements gathering, across a variety of stakeholders, and aids significantly supporting the execution to meet the target architectures.

In the context of digital transformation, a data prep tool decoupled from the underlying sources used in a top-down approach allows teams to be aspirational with their target data architectures and not held back or constrained by the limits of the existing data, its availability, or its current shapes (schema).

There’s much more to the discussion but hopefully this article helps to open your eyes to the opportunities for end user data prep in a context beyond simplified data wrangling.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind