Sign up for our newsletter and get the latest big data news and analysis.

ClearStory Data Announces Advancements in Automating Data Preparation and Data Blending via New Machine-based Approach

ClearStory-Data-LogoClearStory Data, the company bringing business-oriented Data Intelligence to everyone through fast-cycle, disparate data analysis, today announced a breakthrough advancement to its industry-first data inference and Intelligent Data HarmonizationTM capabilities called Infinite Data Overlap Detection, internally code-named IDOD. With this R&D innovation and the research behind it, ClearStory’s Spark-based, business-ready analytics solution now detects and infers data patterns and customer-specific data types for all values for all data types in every source that a user connects to as part of an analysis.

Earlier this month, Gartner named ClearStory a Visionary in its debut in the Magic Quadrant for 2016 Business Intelligence and Analytics Platforms. As the MQ report notes: “Market awareness and adoption of smart data discovery will extend data discovery to a wider range of users, increasing the reach and impact of analytics. These emerging capabilities facilitate discovery of hidden patterns in large, complex and increasingly multi-structured datasets, without building models or writing algorithms or queries.”

The new IDOD advance addresses a growing market need to blend and harmonize complex, multiple “categorical value” data sources that are highly dimensional. The nature of this data analysis complexity, and the diverse sources the data originates from, are prevalent across all Global 2000 organizations and create the root cause for the biggest delays and challenges in speeding business insights. The benefit to all organizations is faster, more precise insights on large, complex data sources including ones with a high degree of customer-specific information, which is common in almost all companies across industries and contributes to a rise in data analysis complexity.

ClearStory’s new, large-scale IDOD capability is used to determine how complex data from multiple sources should be blended, viewed, and visualized on the fly. IDOD plays the role of data modeling advisor to the business user, enabling them to blend data together and discover insights quickly, without data modeling expertise and days or weeks of manual effort.  ClearStory’s approach replaces traditional methods of manually matching data or column headers or sampling data, all of which are labor intensive and error-prone.

In primary research conducted in October 2015, nearly 70 percent of companies polled report they need access to refreshed data insights either hourly or daily. Eighty-six percent of them struggle with this challenge on a regular basis where four or more data sources and file formats are involved for analysis. A majority of respondents (68 percent) report they experienced “data blindness” at least once per week because they could not spot “what’s happening now, and why” soon enough, impacting their ability to make smart decisions and perform their jobs well.

The most difficult part of this problem being addressed has always been the customer-specific attributes and distinct values and nuances of data such as product names, category names, distinct phone numbers, product codes, and brand attributes. Such data and attributes have traditionally required heavy manual data wrangling to reconcile and inspect many thousands to millions of unique values with integrity and consistency.

Take one of these data sources and add to that more such sources that need to be blended together, and what results is a long, painful, and error-prone process. As the data sources update, the subsequent repeated headache of preparing and modeling all the data relationships becomes unsustainable for even sophisticated data stewards. The business impact is major delays in reaching insights without ClearStory’s new smart machine approach.

Highlights of the new Infinite Data Value Overlap capability includes:

  • Smarter Data Inference: Detects and infers the overlap of categorical values for all data types across hundreds, millions or even billions of unique values for attributes across all the source data being analyzed;
  • Infinite Types: No limits on how many unique custom data types, custom dimensions, or values can be recognized in each source for data inference and data harmonization;
  • Extensibility: New data types can be  easily patterned and plugged into the capability for increased automation of vertical industries’ custom data types. This brings a powerful way to address vertical-specific and customer-specific data nuances and complexities;
  • Granular Data Scoring and Data Relationships – Detailed granular scores are calculated for each custom data type and the values within are used to determine more sophisticated ways to automatically blend data sources together into a holistic, harmonized view. Even data sources with hundred of millions of unique values per attribute can be intelligently inferred and automatically scored and matched to enable users to reach fast meaningful insights;
  • Simple User Experience: As in all areas of the ClearStory Data solution, ease of use and an intuitive user experience is of the utmost importance. With IDOD, ClearStory extends its data inference and Intelligent Data HarmonizationTM user interface and experience to surface the power of the new, advanced processing engine in a simple, user-friendly way so users can be self-sufficient on even complex data sources.

ClearStory’s introduction of new automated, machine-based advancements in data preparation, discovery, and data harmonization continues to build on its Spark-based core IP to process large-scale data at high speeds,” said Dr. Tim Howes, CTO of ClearStory Data. “By adding the advanced IDOD capability to automatically recognize infinite categories, values and granularities in data sources, we speed the cycle of data to insights by addressing a significant pain point that enterprises across all industries face today: the intricate, tedious task and massive time sink caused by manual data wrangling on large, complex data.”

ClearStory Data’s new capabilities are offered as a core part of the ClearStory solution and customers can experience it as part of their standard offering. For more advanced users, the data extensibility feature can also be made available as a premium API-based service.

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: