Sign up for our newsletter and get the latest big data news and analysis.

Hypothesis-led data exploration is failing you …

In this special guest feature, Aakash Indurkhya, Co-Head of AI at Virtualitics, suggests that you should set your assumptions aside and start looking at your data through the lens of AI. Aakash graduated from Caltech (Computer Science, 2016) with a focus on AI, ML, and Systems Engineering. During his time at Caltech, he founded and taught a course on big data frameworks (Spark/Hadoop). He has contributed to research at Caltech and Duke University. At Virtualitics, Aakash manages the development of new machine learning capabilities in the Virtualitics AI Platform (VAIP) and client solutions. He holds several patents in data visualization, artificial intelligence, network science, and telecommunications.

Yesterday’s BI tools aren’t suited for exploring today’s breadth and depth of data

Companies today have an overwhelming amount of data at their fingertips. They’re making huge investments in resources to store, manage, and organize that data from every part of the organization, in the hopes that it can be mined for future use and give the business a leg up. 

But that hope isn’t being realized. We’ve been talking about data being the new oil, but very few companies are striking it rich. Why? Because the tools we’re using to try to make sense of our data simply can’t handle the complexity that lies within. Just like crude oil has no value until it’s been refined, data has no value until it’s understood. 

Ten years ago an analyst would use a standard spreadsheet to understand their data, maybe applying a pivot table to visualize it. But now, there is an explosion of complex data that makes this method completely insufficient. The number of columns a spreadsheet could have is endless and the relationships between those column values are exponentially higher. No amount of pivot tables will help you spot the connections that matter. 

If we continue to explore the old way, hoping that we can parse out the connections across multiple analyses, we’re going to fail. We’re going to miss relationships and change something without any thought to up- or down-stream impacts. Or we’re going to see relationships that aren’t really there, or that aren’t as impactful as we think. 

Why do the exploration methods we apply to our data matter so much? Because we’re building entire business strategies on the results of our exploration. Executives are making decisions based on the insight we put in front of them. Business lines are implementing changes based on the reports and recommendations analysts provide. We’re building and deploying AI models to target business problems that we think are important, and taking into account the drivers that we found. 

If we’re not looking at all of the data that we have at our disposal holistically then we’re building everything that comes after on a shaky foundation. 

The risk of letting your hypotheses lead your exploration

Since it’s been impossible to explore all of the data relating to an area of the business, analysts have had to pick and choose what to include and what can be safely left out. The guiding principle is the hypothesis that they’re investigating. Where does that hypothesis come from? The same human observations that identified the business problem being investigated.

For example, a business has noted that orders are frequently arriving later than needed. Confident that the inventory management system is operating as it should, with orders placed in plenty of time, the business leadership is fairly certain that there’s a problem on the supply chain side. This late order problem, and the associated hypothesis of supply chain issues, is taken to the analysts to investigate.

The analysis team may want to cast a wider net to check for other causes or relationships, but given time and resources, only data relevant to the supply chain hypothesis is gathered for exploration. It’s quite possible that there will be some interesting insights in this set of data suggesting that there is room for improvement, and that information is fed back to the business. A solution is discussed and agreed to, and the project moves on to the data science team to execute.

But just because something was found in the exploration of the hypothesis, doesn’t mean that it was the right insight to act on. Addressing a small problem could be leaving a bigger problem to fester, meaning that you’re not having the big impact on the business you want. And if you’re addressing a symptom of a problem and not its source then you’re not fixing anything at all. 

The risks of incomplete exploration are significant. In addition to underwhelming analytics and AI that aren’t addressing the real issues, you’re missing out on the opportunities that will pack a powerful punch. But you’re also introducing bias by relying on your hypothesis and aiming your attention in one direction while leaving out data that could change everything about how you approach a problem. You could also be leaving in data that has an outsized impact on your results.

Whether the AI model you end up creating based on this hypothesis-led exploration isn’t strong enough or focuses on the incorrect issue altogether, valuable time and resources have been wasted and additional risks have been added to the business. It’s no wonder that almost half of AI models never make it out of development.

AI-powered data exploration can give your business an advantage. 

Despite the fact that most people consider AI to be the outcome of data exploration work, it’s actually a tool that you should use as you start your data exploration. Where before you had to limit your exploration to a narrow data set that you think is most relevant, AI lets you explore all the relevant data. AI-powered data exploration doesn’t just look at the individual drivers or values of significance, it spots the connections between dimensions. 

Analyzing huge sets of data for meaning is one of the things that AI does best. If we make it a part of every project then we can finally start to get some of that “big oil” value from our big data. When we see clearly how the pieces of the puzzle connect, the real drivers of the business become apparent and lay the foundation for everything that comes next. 

If we revisit the supply chain example from earlier and start with AI-guided exploration–we call it Intelligent Exploration–everything changes. Instead of having to rely on an observation that inventory management probably isn’t a cause to keep the scope of our exploration under control, we can include that data as well as anything else that could be a factor. 

AI created for exploration can handle the volume and complexity of data and can take everything into account before quickly surfacing the drivers of late shipments. Perhaps some of those drivers lie on the inventory side? Maybe it’s a combination of factors under certain circumstances–when working with this particular supplier under these conditions, orders need to be managed differently. The real cause could be a subtle relationship that would never have been observed and brought forward for consideration with old and limited data exploration techniques. 

Looking at the data differently, more holistically, shakes out the things that matter and provides a solid foundation for impactful AI. When you do get to building a model you’re building one that’s targeting a real problem, not a shadow of one.

Multidimensional exploration needs 3D visualizations 

We are visual creatures and much has been said about the importance of visualizations in data storytelling. Multidimensional data needs next-level 3D visualizations to really illustrate the captured insight. 3D visuals have actually been proven to boost understanding and engagement, which is critical when you’re working with stakeholders to create business-changing AI models. 

AI reveals more complicated data relationships than the business users get from their dashboards and reports. Many of today’s well publicized AI ‘misses’ could have been avoided if business stakeholders had better understood how the proposed model was supposed to work, how different data was weighted, and so on. It’s critical that we ensure that our findings are clearly illustrated for the business so that we can get their input and support. 

How data visualization can help lead to more effective drug repurposing 

Take healthcare, for instance. The amount of data in this industry is exploding, making it difficult to manage and explore given the time and resources of IT staff. When it comes to drug development, it is much cheaper to repurpose an existing drug and accelerate the clinical trials than it is to build a new drug from scratch. Using intelligent data exploration, healthcare teams can figure out which drugs to repurpose, know what the potential interactions with other drugs might be, and the existing uses of those drugs. 

AI and 3D data visualization can be helpful in this process. AI is a great asset to pinpoint drugs that are most viable for repurposing and help determine the recommended treatments for diseases that the drugs are designed to treat. It can also help better understand the relationships between various drug candidates and what diseases they are being used for. 

Using 3D data visualization, healthcare teams can expose this data in a way that is easy to understand without losing the rigorousness of the data. AI can help teams understand relevant relationships between data, and visualization can help communicate those findings to people involved in the drug repurposing process. Both work hand in hand to make it easier to explore and communicate data, which can lead to more precise outcomes.

Change your data exploration, change your business

If you keep looking at the data the same way you always have—the same way your competitors are—you’re going to keep finding the same insight, make the same decisions, and have the same rate of AI project failure. With all the complicated data that businesses must sort through and make sense of, the conventional method of hypothesis-driven data exploration just won’t cut it anymore. 

Set your assumptions aside and start looking at your data through the lens of AI. Cut through the noise, surface significant insight, and take aim at the real issues. Forget data as oil–data is gold and Intelligent Exploration is the sophisticated tool that’s going to help you get at it. 

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Leave a Comment

*