Self-Service Data Prep: Illuminating Dark Data for Better Business Decisions

Print Friendly, PDF & Email

Jon PilkingtonIn this special guest feature, Jon Pilkington, Chief Product Officer at Datawatch, examines how self-service data prep can help companies illuminate dark data for better business decisions. As Chief Product Officer, Jon brings more than two decades of business analytics experience to Datawatch, including 18 years in the business intelligence market. He joins Datawatch from Sonian Systems, a public cloud email archiving vendor, where he served as vice president of marketing and product management. Jon helped that company raise more than $20 million in venture funding while establishing Sonian as a global leader in public cloud email archiving. Jon holds a B.S. in Management Information Systems from Bryant University and is the recipient of several industry awards, including the Massachusetts Technology Leadership Council 2008 “CXO of the Year.”

Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).” When it comes to analytics, this dark data can provide immense value, and it’s a critical requirement in ensuring an accurate and a holistic view of the business. So, why are data analysts and business users ignoring it?

Data that provides the most analytical value has traditionally been locked away in multi-structured or unstructured documents, such as text reports, web pages, PDFs, JSON and log files. The only way for the average business user, and even data scientists, to use this information has been to manually rekey and reconcile the data, which are time-intensive and error-prone processes. Because the data is often deemed inaccessible, or not accessible in an acceptable timeframe, these information sources are too frequently “written off” or simply forgotten altogether – and hence, dark data is born.

Instead, business users and analysts turn to easily accessible data, such as relational, CSV and other standard, structured data, for analysis, but it only gives them part of the picture. In fact, it’s estimated that only 12 percent1 of enterprise data is used today to make decisions.

The good news is that self-service technologies are taking data out of the dark ages, and transforming how analysts and everyday business users can acquire, manipulate and blend it. Data preparation (prep) solutions are enabling users to tap information from virtually any source, including dark data housed in previously inaccessible multi-structured and unstructured documents. Users can gain fast access to not only the right data, but all of the data, crucial to getting a holistic view of the business. This means more time can be spent on performing analysis that yields actionable business insights.

Here’s a look at three “dark data” repositories that can instantly become accessible (and invaluable) with a self-service data prep solution.

1 – Enterprise Content Management Systems

Businesses are challenged with making strategic business decisions using clean, reliable and trusted data located in disparate areas across their organization, and there’s a wealth of untapped information living in the documents housed in enterprise content management systems. With a self-service data prep solution, users can mine these systems for meaningful data sets contained within the content. Data prep turns content management systems into new sources of data – transforming static reports into tabular data that can be filtered, sorted and aggregated with other content for effective research, analysis and planning. Content management systems become a “search-to-action” tool based on the secure, accurate and official data contained in line of business reports, invoices, statements and other official documents.

2 – Web-based Applications

It’s no secret that organizations are increasingly moving all kinds of corporate data to the cloud. While it’s easy to access the actual applications, like, retrieving the right data for analysis is a whole different matter – one that can actually be quite complex. Web applications typically provide APIs, so programmers and IT professionals can move information in and out of the systems in question. But, because the everyday business user isn’t versed in programming, they rely on built-in reporting mechanisms to access the data they need. Self-service data prep solutions can capture data from a URL or even from screen scraping.

The other problem is, even if reports are run daily, they are static, point-in-time representations that don’t provide greater insight into the business. So, if you want to understand how the sales pipeline has changed in over the past week, month or year, it is an impossible task. Self-service data prep can be used in an automated way to pull table extracts or snapshots of data from web applications on a daily basis. The ability to extract, aggregate and assemble these snapshots provides users with a historical view, as well as an all-inclusive look at the business, so they can easily identify anomalies, behaviors, problems, opportunities and trends over time.

3 – Financial and Operational Processes

Data resulting from financial and operational processes can be extremely beneficial not only for analytics purposes, but for improving company policies and procedures as well. But this information has traditionally been inaccessible to many companies, causing it to quickly turn into dark data.

Data reconciliation is one of the most common operational use cases we see for self-service data prep; many organizations are still comparing data sets and documents manually, line by line. Not only is this time-consuming, but it’s costly, especially when you think of the man hours it takes to complete.

One of our customers, a well-known retail giant, is a great example. The company was seeking to expedite supply-chain management by liberating its team from paper-based inventory reconciliation and invoice processing. Its supply-chain process and inventory management systems had grown in multiple directions over the years, and invoices arrived in a wide range of formats. This necessitated extensive manual intervention, much of it by paper and hand-written adjustments. With our “point and click” data prep solution, the retailer was able to create a more efficient and technically advanced set of supply-chain and store inventory management processes, and standardize them across regions.

Through an automated, Excel-like process flow, the retailer has joined data from multiple suppliers and vendors into one system, automating dozens of workflows and practically eliminating paper altogether.  The company has reduced labor costs for manual processes by 50 percent and is saving an astounding $6.5 million in annual printing costs! With reporting time now cut in half, the company also has a much more accurate, timely view of inventory supply and demand, and is better equipped to support its supply-chain objectives.

Increased Agility Empowers Ordinary Users

There’s no question that one of the greatest benefits of self-service data prep is increased agility for its users. And with this newfound skill, ordinary users can illuminate dark data within their organizations and transform it into actionable intelligence that drives timely, more informed decisions and delivers greater business value.

1 The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014:


Sign up for the free insideBIGDATA newsletter.


Speak Your Mind