Sign up for our newsletter and get the latest big data news and analysis.

Why Data Quality is So Elusive

Jake_Information_BuildersIn this special guest feature Jake Freivald, Vice President of Corporate Marketing for Information Builders, asserts that having clean, error-free data helps drive growth and enhance critical business processes, but unfortunately this goal is hard to achieve without the proper data strategies. Jake is a vice president and expert in BI and analytics for Information Builders. He is responsible for helping guide branding, communications, marketing, and events for the company. He graduated from Cornell University with a Bachelor of Science in Electrical Engineering in 1991.

Having clean, error-free data helps drive growth and enhance critical business processes.

The advantages of high data quality manifest in different ways in different organizations, but we can put many of them into four major categories:

  • Better relationships. When your customer information is accurate, customers trust you more. Every time a customer notices that your data is wrong – especially if it inconveniences them – they think about leaving for the competition.
  • Better strategies. An accurate picture of your business helps you know what you should do to improve results. Inaccurate views of customers, suppliers, processes, and KPIs tend to skew strategic decisions in the wrong directions.
  • Better spending. To see where you should spend money, you need an accurate picture of your business. To see where you are spending money, you need an accurate picture of your suppliers and outflows. If you don’t have both, you end up spending incorrectly, perhaps solving the wrong problems or not getting discounts you really should.
  • Better accountability. If someone at your company can blame bad choices on bad data, it’s hard to hold them accountable. That’s true from the executive suite to the mailroom. If the business situation is clear and accurately portrayed, though, you can more easily determine whether the problem is their data or their choices.

But it’s hard to achieve a high level of data quality. The growing volume and variety of information has created bad data that lives deep within every kind of system – and even a small amount of invalid or “dirty” data can create countless problems, wreaking havoc as it flows during the course of business processes, permeating many different information sources.

For example, an incorrect client address in a CRM application can cause incorrect order and invoice routing, confusion on the phone with customer support, rework by field service, badly targeted marketing marketing, and improper promotions. Or a missing part number in a materials management system can hinder the efficiency of purchasing and procurement activities, and eat away at sales and profits by slowing down manufacturing processes.

To ensure they’re using sufficiently high-quality data, companies have to develop comprehensive rules, policies, and procedures to eliminate errors, mistakes, duplications, and inconsistencies in all sorts of systems: back-end systems, customer-facing applications, data warehouses, and B2B systems. They need to implement cutting-edge technology to facilitate the ongoing execution and enforcement of those guidelines. Organizations of all sizes, across all industries, struggle to achieve and maintain enterprise data quality.

To make things worse, there’s a proliferation of self-service data visualization tools that cause even more confusion. Each analyst works on her own data set, “fixing” data that doesn’t get propagated back to the source. They also don’t use metadata. Then people wonder why they’re getting different results. It’s Excel Hell all over again, with prettier graphics.

So what to do?

It would be best to have a data strategy. Since it’s strategic, a data strategy includes the typical strategic business notions of people, process, and technology; since it’s a strategic approach to data, it relates the way people, processes, and technology address the organization, governance, and usage of data.

I’ve written extensively about data strategy elsewhere, but for now I’ll oversimplify and talk only about data quality.

People. Ask, who is responsible for organizing the data? Who has sufficient insight to know how it should be governed? With whom should it be shared to benefit the business?

These questions help identify requirements for data quality. They might show you executive sponsors you hadn’t thought of, but who would benefit from improved data quality. They might identify knowledge workers who could champion the cause of higher data quality. In addition, they might help identify data stewards: people, usually on the business side, who understand the value of specific data and who can be accountable for its quality.

Processes. Consider the processes where information is generated or modified, and the differences in requirements for quality, latency (real-time, daily, and monthly are common), and remediation.

Business process owners should discuss which data requires the highest level of accuracy, and those that have a greater tolerance range: The difference in cost and effort between a 3% error rate and a 0.3% error rate can be phenomenal. Ideally, there would be data quality metrics in processes where specific data is consumed, so people would understand what the quality of their data is and how the data quality program has benefited them. Similarly, processes for remediation of bad data, and the connection between the business and IT, should be a special focus of effort.

Technology. Specific technologies should be the last consideration: Never implement a technology without a business case for it.

Having said that, many data quality programs have similar requirements. A typical program requires near-constant profiling of data and a set of quality metrics. A common data quality implementation includes an array of capabilities – from validation and cleansing through enrichment and data governance – to make information as timely and trusted as possible while still being manageable and flexible. The tools needed for uncovering and remediating invalid or incorrect information is important, too, as is the need to work with data in any system, database, application, document or message.

There’s no magic bullet. Data quality is elusive because it requires a focus on business benefits, a partnership between businesspeople and IT, and a continual rediscovery of the ways in which people, processes, and technology help a company organize, govern, and share data. The issues are specific to each organization. But by taking a methodical approach, companies can come together to nail down the specific areas that need help, and the type of improvements needed.


Sign up for the free insideBIGDATA newsletter.

Leave a Comment


Resource Links: