Sign up for our newsletter and get the latest big data news and analysis.

Dark Data is Only the Tip of the Iceberg

Dark data, a term coined by Gartner, is data that is generated or collected by businesses as part of their routine processes without being used for analysis or other purposes. From the budgetary perspective, dark data, whether on a physical or digital carrier, is a clear-cut liability as long as it remains dark. Businesses pay for its storage and get no returns for doing so.

Up to 73% of the data being collected by companies is estimated to be dark. Not all of this data holds valuable insights, and the value of its exploration is often less than clear-cut. Yet companies are increasingly looking to delve deeper into the dark. The truth is, the very existence of dark data points to a larger issue—companies do not understand the mindset needed for succeeding and extracting value from data.

Knowns and unknowns

In the words of the late U.S. Secretary of Defense Donald Rumsfeld, one of the most prominent political figures in recent U.S. history, dark data is a clear-cut known unknown. This concept is used in intelligence circles to refer to variables of an unknown value that are nevertheless on the radar. In our case, the company knows the old reports are sitting in the archives, but not necessarily what insights they may hold—if any.

While delivering his famous DoD briefing on February 12, 2002, Rumsfeld rightfully pointed out that unknowns we are aware of are not as worrisome as those we don’t know of. You can account for something that may be a mystery, but a mystery you know of. If you’re not aware a variable exists in the first place, it will not be part of your equation.

For businesses, this means that there is more data out there to think of than whatever sits in their data warehouses. To become dark, the data has to be collected in the first place. But so much potentially useful data is not being captured at all. Mining companies, for example, are experimenting with wearable technology for worker safety. It’s a relatively new trend and is not being used as widely as it could. This showcases the problem: The workers’ positioning data is useful and easy to collect, but until you do so, it is not even considered your dark data. 

Similarly, the mall industry woke up years ago to the power of customer data collected via WiFi. In the future, if the legislation allows it, we could see AI-enhanced security cameras providing businesses with data on customer behavior that will be used to deliver personalized recommendations and promotional offers to shoppers in an automated manner.

In essence, any field and industry can benefit from the introduction of data collection processes and machine learning. In animal farming, sensor-collected data can even be used to analyze animal behavior, predict diseases, and optimize nutrition, research shows. Optimizing bookings for barbershops, predicting machinery failures in manufacturing, or enhanced seismic analysis for oil exploration—if you can connect the data at hand with the right business question, the answer will bring your company new value. 

Struggling to keep the pace

The problem with finding the answer is simply that it takes time. A modern enterprise has to tap developer teams, data engineers, and analysts to collect and process data in a process that inevitably takes lots of time and money. Once set in place, the never-ending data collection cycle can end up spinning out of control and hauling in more data that the team is capable of reasonably analyzing within a reasonable timeframe. 

Flooded with data, insights teams are forced to prioritize urgent, tactical-level tasks, while more substantial and strategic research remains untouched. At the same time, the data that they never get around to analyzing effectively goes dark, bringing the company no tangible value. 

This approach can often leave no resources or time for data teams to get creative, experiment, and work on models that would explore unknown unknowns instead of reinforcing the business models that produced the available data. During my service, I encountered a similar issue: The race to keep track of the clandestine communications methods used by hostile groups left us with no time or mental resources to realize that the data we already had could be used to detect them way more effectively. By re-shuffling our priorities and investing some manpower into creative experiments, we got it right. It all started with reviewing the challenge before us from a more strategic perspective and finding a better solution with the data at hand.

Instead of hoarding more and more dark data, companies must learn to manage their priorities and follow the right procedures in a way that enables them to evolve and develop novel business processes and models in parallel to the evolving business needs. They must learn to strategize with data the right way, where data collection is a means to an end, not an end in itself.

The gold sitting in the dark may indeed hold valuable insights, but the future belongs to those who innovate sooner rather than later. With data, innovation must flow as an efficient process of asking a question and sorting the available data to find subsets that generate more business value quickly and cost-effectively. From my experience, success always comes from the willingness to never stop trying. Internalizing this simple formula with the right procedures is crucial for doing the data transformation right. 

About the Author

Ronen Korman is a Founder and Co-CEO at Metrolink.ai, a dataflow management solutions company. A technology leader with 30 years of experience in R&D, he served as the commander of the Israel Defense Force (IDF) elite technological unit 81 in the rank of a Colonel and was Head of Technology for Operation and Cyber division (General equivalent) at the Israeli Prime Minister’s Office. He has been awarded with the prestigious Israel Defense Prize and takes pride in being a seasoned technology geek.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: