The Need for Cataloging, Governance, and Collaboration in the Age of the Data Swamp

Print Friendly, PDF & Email

In this special guest feature, Josh Howard, Director of New Products at Alteryx, Inc., points out that for companies to remain competitive, no matter the industry, analysts across business units will need to make sense of all the data sets. He also offers 3 ways that companies can ensure that the right people are activating the right data to better the business, instead of leaving it to drown in the swamp. For the past 15+ years, Josh has been helping software companies define their go-to-market and product strategies across a wide variety of data management technologies – from advanced analytics, business intelligence and data integration tools, to database management, data architecture, and data center services. He received a Bachelor’s Degree in Business Administration, Marketing from Stephen F. Austin State University.

There is no question that big data has transformed, and will continue to transform, every aspect of business. The landscape has become so complex and the amount of data so vast – some estimates have the amount of data that will be produced annually worldwide as 163 zettabytes by 2025 – that we’re not only dealing with streams of data, but lakes of data. Without the proper structures for vetting and governance in place and the ever-growing unstructured data sources flowing in, it is easy for these lakes to become mucky, stagnant swamps. In fact, 90% of an organization’s unstructured data is never analyzed, referred to as “dark data,” per IDC. Theoretically, much of the tens of millions of dollars spent in the big data industry can go to waste as many are still wondering how to activate the insights festering at the bottom of these swamps.

People are clearly trying to make sense of all of the data, especially based on the fact that big data management solutions are expected to grow 12.8% by 2021. Part of the creation of the data lakes are due to the influx of social data, data collected through connected devices, and machine learning from a multitude of sources, like self-service analytics, cloud data, device-driven data, financial data and everything in between. As businesses ingest multitudes of data from a growing number of sources, the issue of unstructured data sets is created, which go untouched and create the swamps. But, the data collection is only one piece of the puzzle.

For companies to remain competitive, no matter the industry, analysts across business units will need to make sense of all the data sets. But, how can companies ensure that the right people are activating the right data to better the business, instead of leaving it to drown in the swamp?

  • Cataloging – if all data sets are stored in various places and not easily accessible, those working with the data will spend more time searching for specific assets, creating inefficiencies and slow turnaround. Through cataloging the data, those accessing and activating the data can work with the most up to date information. From there, a single version of a data analysis is created, providing the same data-driven insights for similar questions asked in various units instead of multiple versions that share the same theme but are supported by different data sets.
  • Governancein the data swamp, all data is treated equally. But, in reality certain data points can be considered “more important” than others based on sensitivity of the content or level at which the data is used. There needs to be a differentiation of data that is trusted, official data of record and data that is available for general consumption. Helping authorized analysts make sure that they discuss the data, determine who produced it, and know who is responsible for updating it helps to make sure that the data is approved and is agreed upon. Through proper governance, companies can ensure data being used across business units is accurate and vetted.
  • Collaborationa company’s entire data collection may be an effort of many lines of business, but rarely do the multiple units discuss the insights derived from analyzing the data. Larger conversations around what each unit discovers thanks to the data can break down silos. It is similar to a tourist in a foreign city. Upon arrival, they turn to YELP for thoughts on local restaurants from those who have previously visited, and make decisions based on their thoughts. Someone else has already solved the same issue you are facing, and have saved you the time to figure it out yourself.

Data analysts want to point to data to resolve issues. But without proper cataloguing, governance, and collaboration, analysts are unable to have a deep understanding of what data they have, what data is being used, and who is using it. Taking the time to manage data appropriately creates the good data to rise to the top of the data lakes, accelerating the time to deliver data-driven insights that are essential to a company’s competitive advantage.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind