Is Your Big Data Approach Complete?

Print Friendly, PDF & Email

The past few years have witnessed dramatic advances in terms of how organizations leverage their internal data. No longer just a back-office operation for after-the-fact operational analysis and future planning, business intelligence (BI) and data analytics have evolved into Big Data, powering enterprises to respond more proactively and accurately to a constantly changing business environment.

One area where Big Data is really making its mark is in improving the customer experience. Consider the example of Westpac Banking Corporation – for the last few years, the Australian bank’s “KnowMe” program has captured and centralized activity such as ATM usage and call center interaction, from approximately 12 million customers. Based on behavioral analysis, Westpac can match customers with new programs and offerings, and in only nine months, the program enabled Westpac to grow its customer engagement from one percent to 25 percent.

While examples like this abound, internal data alone comprises only half of the information picture. When internal Big Data insights are combined with external data sets, the benefits are amplified exponentially.

Consider the simple, yet brilliantly successful example of a fast food chain, which “trained” its drive-through cameras to determine what items to display on digital menu boards, based on line length (an external data point). When the lines were longer, the menu would feature products that could be served up quickly; when the lines were shorter, the menu would feature higher-margin items that took longer to prepare. On the other hand is the example of the hand sanitizer manufacturer who, in spite of extensive planning and forecasting, left retail store shelves depleted and missed major sales opportunities by failing to identify and adjust distribution based on real-time regional flu outbreaks (another external data point).

These examples underscore how important external data is to a comprehensive “sense and respond” strategy (in the case of the fast food chain, all the data analysis in the world that helped them create the most appealing menu, would be useless if customers were turned away or disgruntled by a long wait). However, businesses have historically paid much more attention to internal company data, and we see an ongoing, strong inequality between internal and external data in most companies. Here are the primary barriers preventing companies from uniting the two datasets and getting the most comprehensive, actionable picture:

  • Cost – Internal data is free for the businesses in a sense that they don’t need to pay to get access to it because they produce it themselves and they can make it available to all employees. External data typically comes with a cost and restrictions. Bloomberg terminals are expensive and only few people in a company may have access to them. Many businesses do recognize the importance of having access to the most relevant and up-to-date external industry data and pay big bucks for it, but there are no tools for democratizing this access. In contrast, there are numerous open source, free tools available for analyzing internal data.
  • Tools – Many external data publishers provide archaic and unfriendly tools for accessing their data, which makes it difficult to access and use. The industry badly needs new methods of finding, accessing and including external data in the analytics effort. Given the massive amounts of data that are becoming available, it is not always practical to move all relevant data within an enterprise’s four walls, and APIs are needed to support easy and fast transport and augmentation with internal data.
  • Complexity – Proprietary data designed for corporate consumption often is spread across dozens of external systems (examples include Experian and Dun & Brandstreet credit reports; Axciom demographic data). Even when these data sources are within the scope of traditional enterprise search engines, search solutions are document-oriented and have weak support for data searches or time series searches (a series of data points indexed, listed or graphed in time order, often on a chart), if they have this support at all. Typical documents are mostly text and contain thousands of words while a typical data record has very little textual metadata available.  Also, billions of time series represent a scale which is orders of magnitude higher than traditional document-based search solutions were designed to handle.  Both of those factors pose unique challenges only a specialized data-first search engine can handle. Data marketplaces serving to amalgamate all of this data is a start, but they need highly precise, AI-powered search capabilities specifically tailored to unearthing data.

The bottom line is, in the Big Data era, just using internal data is not enough anymore. The ability to supplement internally-derived data insights with those harvested from external data will become a strong weapon for all types of businesses. Achieving this will require a level of external data democratization, similar to what we see more frequently with internal data and focused on eradicating the barriers of cost, tools and complexity.

About the Author

Vladimir Bugay is CEO and Founder of Knoema Corporation and has been the key technical architect at Knoema since its foundation. He’s responsible for client delivery and manages technical teams in Russia and India.


Sign up for the free insideBIGDATA newsletter.



Speak Your Mind