Has the Accessibility of Data Visualization Overshadowed the Value of Data Prep?

Print Friendly, PDF & Email

Mark Marinelli_LavastormIn this special guest feature, Mark Marinelli of Lavastorm Analytics poses a thesis that when many people think of ‘big data’ they imagine beautiful visual dashboards in Tableau or Qlik, but the big data world is much bigger than visualization technology.​ Mark is the Chief Technology Officer of Lavastorm Analytics with over 17 years of experience in the software industry, with a software development background in multidimensional databases and analytics.

The explosive growth and adoption of interactive data visualization products from vendors such as Qlik and Tableau has led organizations and executives to demand more analysis of more of their data. While the intuitive visual nature of these tools has enabled more people to easily access more individual data sources, the full potential of these offerings is still limited by the end user’s ability to combine and enrich many data sources to build a robust and trusted analytical foundation for their visualizations. This limitation is only made more acute when working with true Big Data workloads, where data volume and variety can overwhelm both the tools and the users’ technical skill sets.

Data visualization tools have allowed data to be viewed in an entirely new and engaging way, but those tools themselves do not comprise the full analytical tool set. Visualizations are often the end result of a much longer analytic process, one which requires a deep knowledge of the data to ensure against asking good questions of bad data, and ultimately making the wrong conclusions. As such, users need to place as much value on the data preparation stage of analysis as they have on the visualization of results. Sometimes this is easier said than done.

There are five common data visualization challenges that I often hear from the market. All five of those challenges can typically be solved by understanding the complementary – but often essential – role which self-service data preparation and advanced analytics tools provide in the overall analytical process. Here’s a quick rundown of the problems and how they can be solved.

  1. You’re limited in the data you can access. Data prep tools allow you to access data from disparate sources and formats, not only from spreadsheets and simple queries. The more accessible the data and the more data you can include will produce a more comprehensive analysis.
  2. You’re wasting time on manual data wrangling.Data prep tools allow you to cleanse and combine data quickly, a process which is typically labor-intensive when done manually. The less time you spend on managing extracts and rebuilding pivot tables, the more time you spend on actually analyzing the data.
  3. You’re stuck with a basic analysis.You often have two choices: either use the simplified analytical operations available in your visualization tools, or rely on someone with a technical skill set to build you something more sophisticated. Self-service analytics tools, by definition, expose a richer set of diagnostic and predictive analytics without reliance on a coder to make them work.
  4. The data are not transparent.Data prep tools allow users to trace data to each source and to view every step of the analytical process as a logical flow. Contrast this with spreadsheet calculations or inscrutable code, neither of which present a lineage of what has been done with the data prior to the current step. Showing the answer set isn’t sufficient in order to trust the results – you need to show the work that got you there.
  5. You need to start all over again, and again.Data prep tools enable the data preparation steps to be reused across projects and shared with others. So you can start at step 10 instead of step 1 the next time you want to build a visualization, allowing you to re-use data that you’re previously prepared.

Data visualization products can be extremely powerful, and have allowed a broader user base to access data, but these analyses only become more powerful when combined with the right set of self-service data preparation and analysis tools.

 

Sign up for the free insideBIGDATA newsletter.

Speak Your Mind

*

Comments

  1. I think Mark has nailed it with this article. I have supported dozens of customer; solution deployments, DWBI stand-ups and roll outs, app development cycles and ‘data prep’ is one of the toughest and most overlooked part of any deployment. There are few things that frustrates an end user team more than delays … but ‘Bad Data’ tops the list. The cliché “Bad data in ….” holds true especially for Big Data projects.

    As Big Data is still in it’s infancy, we have seen 20 projects in the New England region over the last year that have stalled due to data integration challenges that Mark describes in his article. Data access and data prep are two of the big obstacles standing in the way of tangible progress and value. Popular concepts like Data Lakes offer a great opportunity to automate some of the data prep at low cost and low risk to figure out the right treatments, cleansing and enhancements necessary to create the ‘pristine data set”.