In this special guest feature, Joe Pasqua, Executive Vice President of Products at MarkLogic, discusses how effective data governance is more important—and more elusive—than ever before. It enables companies to frame data according to their strategic initiatives, regulatory mandates and core principles. It reduces risk and increases a company’s ability to make the most out of the data in its possession. Joe brings over three decades of experience as both an engineer and a leader. He has personally contributed to several game changing initiatives including the first personal computer at Xerox, the rise of RDBMS in the early days of Oracle, and the desktop publishing revolution at Adobe. He has over 10 issued patents with others pending. Joe earned simultaneous Bachelor of Science degrees in Computer Science and Mathematics in 1981 from California Polytechnic State University San Luis Obispo where he is a member of the Computer Science Advisory Board.
Organizations have historically viewed data governance as a tax, and most still view it that way. It’s something they have to do for compliance or regulatory reasons, but it’s not viewed as adding value to the business. They are concerned that this will become even more of a burden for them when sweeping regulations like the EU GDPR go into effect in 2018. The General Data Protection Regulation (GDPR) will bring cascading privacy demands that will require a renewed focus on data privacy for companies that offer goods and services to EU citizens. Businesses that don’t comply with GDPR face fines as high as 4% of the company’s global annual revenue.
In contrast, some organizations have realized that governance is actually crucial to driving business value. These are the organizations who understand the value of their information assets and are spending enormous amounts of time and money to harness them. Many people refer to this as the Big Data movement. These organizations know there is tremendous value to be had, but many of them aren’t actually getting it despite their investment. Gartner says: “Through 2018, 80% of data lakes will not include effective metadata management capabilities, making them inefficient.”
There are two main reasons why companies are failing. First, they don’t have the lineage and provenance of the data they’re analyzing. When they put bad or misleading data into their analysis, they’re going to get unreliable results back out. They will also be unable to defend or validate the results they produce. That’s a lack of data governance.
Second, and even more problematic, organizations are afraid to share the data they’ve gone to great expense to create. They can’t answer fundamental questions such as: Under what agreements was the data collected? Which pieces are personal information? Who’s allowed to see it? In which geographies? With what redistribution rights? If they can’t answer these fundamental questions, they can’t share the data without significant exposure to regulatory fines, brand damage, leaked customer or employee information, or lost intellectual property. To prevent the exposure, they are walling off their data lakes. They are wasting millions of dollars and months or years of effort. This is another failure of governance.
Cutting edge organizations are realizing that governance gives them the highest quality results, that can be safely shared with the right audiences, and drive the greatest business value. It doesn’t have to be a pure expense. If done properly, it can be a value driver.
But what does it mean to do it properly? First, organizations need to be careful about diffusion of responsibility. A well-meaning saying like “data governance is everyone’s responsibility” can lead to disastrous results. When something is everyone’s responsibility it can become no one’s responsibility. The same is true at the technology level.
You don’t want every application that is generating or consuming data to be re-interpreting and re-implementing data governance policies and practices. You need experts to interpret the policies and the results of that interpretation should be embodied in as few places as possible – not in every app, every connector, and every web interface. When a rule, policy, or practice changes, the embodiment of that change should happen in one place – not many places. If it has to happen in many places, you are doomed before you begin. This is particularly true in a world of microservices.
This approach has important implications on your data fabric, including:
- It must work equally well for data and metadata.
- It must be able to embody and execute rules, not just store the data.
- It must have powerful security and privacy capabilities for data, metadata, and code.
- It must have extensible interfaces and the ability to expose new data services.
- It must be able to express the relationships between data, not just the data itself.
- It should have strong temporal and auditing capabilities.
Organizations that embrace data governance as an enabler and apply the right technologies will have a strong competitive advantage over those who don’t. They will implement projects faster, with less risk, and less ongoing cost. They’ll also be able to use and share their data assets while competitors won’t. Doug Robinson, the Executive Director for The National Association of State CIOs said it well: “Without good data governance, organizations are spending more to be less efficient and less effective.”
Sign up for the free insideBIGDATA newsletter.