Paxata & TDWI Survey Provides Insight into Factors Driving Self-Service Data Preparation Growth

Print Friendly, PDF & Email

paxataPaxata, provider of the enterprise-grade self-service data preparation platform, announced the results of an industry study published by TDWI and sponsored by Paxata about the self-service data preparation market and its emerging role in accelerating the transformation of data to information.

The survey of 411 business and IT executives, including VPs and directors of BI, analytics, and data warehousing; business and data analysts; line-of-business and departmental directors and managers responded that they are under pressure to reduce the time required to achieve business insight. The research found that over 80% are evaluating self-service data preparation as a key process and infrastructure change to accelerate information-driven decision-making.

It is not surprising to me that people in the study indicated dissatisfaction with their ability to find relevant data and understand how to use it appropriately for BI and analytics. While companies are drowning in data, business teams are thirsting for information they can actually use,” said Prakash Nanduri, CEO and Co-founder of Paxata. “Since we pioneered the self-service data preparation space, our singular mission is to eliminate that very challenge, and truly unlock the value of BI investments. This important research indicates that the strong demand we have seen in the last three years for our enterprise-grade self-service data preparation platform is just the tip of the iceberg.”

The report showcases difficulties working with data has become a major deterrent in providing business insight. The study showed:

  • Data quality continues to plague organizations. 86% of the respondents were not fully satisfied with the quality of their data, and 94% of research participants are not very satisfied with processes for addressing data duplication.
  • Manual data preparation burdens analyst resources. The majority of research respondents said that 61-80% of analysts’ time is spent on manual data preparation processes. This has an impact on associated headcount costs by reducing total capacity for analytic projects.
  • Reliance on IT creates a drag on business responsiveness. Nearly 50% of analysts rely on IT for the first step of data preparation tasks. The largest percentage said that IT takes two to six days; 18% said it takes one to two weeks; and the same percentage said it takes three to four weeks to fulfill a data preparation request. Weeks lost due to data preparation backlogs cause insights to come too late for organizations to achieve data driven status.
  • Too much time spent on data preparation re-work. Most research participants report that they do ad hoc data preparation every time. Only 4% said data preparation is entirely productionalized. This ad hoc preparation taps analyst resources and prevents other analysts from understanding the data. Ultimately, massive value leaks occur as only small portions of data can be explored and analyzed, or inconsistencies in data prep muddy insight.
  • IT data management services are insufficient for rich insight. Almost half (45%) said IT only creates and maintains metadata models. These models tend to orient around integration and definitions that scale for common views. However, exploration and discovery performed by analysts requires an extended set of metadata and alternative data views to gain deeper business insight.

To compete effectively, organizations need faster time to insight; our research shows that most users are unhappy with how much time they spend doing data preparation themselves or waiting for IT to do it,” said David Stodder, senior director of business intelligence research at TDWI and the author of the report. “The report reveals strong interest in improving data preparation and increasing self-service capabilities so that business users and analysts can do more on their own and do it faster, while freeing IT to be more productive and less bogged down with repetitive tasks.”

The report cites a global consulting firm that has partnered with Paxata to develop solutions to help their mutual customers improve data quality for risk and compliance analytics. Paxata’s data preparation capabilities, which apply machine learning and are built to run on Hadoop systems, enable banks to move beyond traditional sampling of perhaps one tenth of financial transactions to look at all the transactions. A consultant at the firm said, “Before we can do any analytics, we need to make sure that the data is of sufficient quality to be trusted so that when the regulators see the analytics, they have faith in the numbers. Very quickly, based on a few filters and attributes, we can use Paxata to get a much more accurate and complete picture of a bank’s risk.”

The survey also revealed the growing need to adopt a modern data infrastructure, as well as a connected information layer, to replace aging ETL systems. The report showed:

  • Self-service data preparation transforms data integration strategies. While 66% respond they are very reliant or reliant on their existing ETL systems, this number is expected to decrease as more companies adopt self-service data preparation.
  • Data preparation lets businesses see beyond their four walls. One third of research participants are either somewhat dissatisfied or not satisfied with their organization’s ability to integrate non-corporate data with corporate data for use in BI and analytics projects. From Paxata’s own research, 60% of data comes from outside of corporate systems. According to IT, 85% of the data being used is company generated and stored in their systems. Conversely, business users indicate a large percentage of the data they need to enrich and add context is from personal, public and premium data from 3rd parties, but not stored in corporate systems.
  • Data preparation is the stepping-stone to mastering data variety. The largest percentages for sources that participants plan to enable for integration or blending are live or real-time streaming data (32%), geospatial data (31%), and social media data (29%). This extends beyond data within internal systems in terms of format and context as well as often working in modern NoSQL and Hadoop systems. Data preparation allows analysts to easily make sense of increased data variety through inspection, machine learning assistance, and data quality improvement.

These survey findings underscore the growing need for self-service data preparation solutions built for everyone within the enterprise who relies on data.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind