Why the Hype Around Big Data Misses the Point – and How Small Data Gets It Right

For years we’ve been sold on this story of Big Data and AI. How together, AI fueled by Big Data would be what powers 21^st century businesses to survive and thrive in a world dominated by digital transformation and user experiences. Everything from healthcare and financial services to housing and climate change, all problems to be worked at and solved by a combination of AI, more computing power and Big Data.

An appealing story, because it validates how so many businesses have operated for years: accumulating massive troves of data to feed into their algorithms to create desired user experiences and outcomes. But the truth is, so much of the Big Data amassed by these businesses – banks, manufacturing plants, mortgage lenders, insurance providers – is still effectively useless. The hype around Big Data only really applies to the giants of tech that have the power, budget, resources, and most importantly digital-first business model from founding to make the most of this approach – while other businesses haven’t historically had that same luxury. And in fact, far from being the fuel that powers AI, Big Data actually undermines and dilutes the potential for AI to transform society for the better.

We need to flip the script. To make data useful and enable AI to create the tangible, real-world positive outcomes we expect from our data, it’s time for businesses to pivot away from Big Data toward Small Data. Small Data provides the precise, relevant, and impactful results we were all told Big Data could provide, while maintaining a level of flexibility and scalability that Big Data, by virtue of its enormous volume and expense, simply cannot.

Big Data has failed AI

AI is supposed to be the defining, game-changing technology for our generation. Whether it’s mobile banking and online shopping, or things that are far more impactful and existential, such as tackling climate change or transforming the financial system, AI is supposed to be the solution that enables other solutions to make these problems solvable and the outcomes better. But the flip side of that idea has been this notion that to optimize AI and produce those better outcomes, it needs to be fed with a lot of data. A constant and unending supply of data. Surely good data means a good AI model, and more data means a better model?

It’s easy to think that just feeding more training data to an AI means the algorithms become sharper, more precise, more insightful. For a small number of companies across a few industries, this method of throwing enormous volumes onto a neural network to get useful insights into product development, customer service or user experiences has worked. But the benefits are limited, the outcomes often too niche to be repeated elsewhere. This is in large part because the previous generation of AI technology cannot handle the diverse variety of enterprise data that organizations need to make useful. While Big Data may be good at some things, it’s not good at helping organizations tackle the problem of the 80-90% of enterprise data that is effectively trapped and useless.

Plus, even if all those data inputs going into the neural network were top-notch quality, and all the outputs coming out of it were precise and impactful and transformative, there’s another, even more critical problem here: so few organizations actually have a big enough mountain of curated training data to work with in the first place.

To have tens of thousands of data sets to train your AI on to begin with, you need to be a massive enterprise that not just has that data but has the tools and processes for generating and collecting large volumes of it. That immediately disqualifies all but the largest organizations out of the running. How can AI be this great gamechanger across businesses and industries if a reliance on Big Data makes it exclusively the tool for a select number of major Fortune 100 companies and no one else? In this way, Big Data has completely undercut what should be the full potential of AI, making its capabilities and benefits exclusive to only a select few businesses. Considering that approximately 80% of this data is often unstructured information locked inside unstandardized files, having a treasure trove of data is nothing to brag about either – if anything, mass collections of Big Data reveal just how little value a company is getting from that data at all.

Making data usable with Small Data

Big Data is an albatross around AI’s potential. Small Data unleashes that potential by making data usable.

Unlike Big Data, a Small Data approach means taking a more practical, more focused and specialized tack to feeding information into an AI model. Instead of training AI on huge volumes of data, Small Data means feeding specific and higher quality inputs into the model – and consequently, getting more precise, relevant, and impactful insights in your outputs. Small Data is flexible and scalable in a way that Big Data, by design, is not. Training AI on smaller and better defined data sets opens new opportunities for easily tweaking models to respond to specific queries, rather than giving generalized outputs, for instance.

This has enormous implications for a myriad of different industries: healthcare, manufacturing, financial services, housing, agriculture, climate action. If AI can be trained on Small Data sets to produce real actions, that makes AI suddenly viable for anyone and everyone, not just the biggest enterprises with the biggest pools of data and the largest tech teams. And while Big Data may favor bigger organizations with larger data volumes over the smaller ones, Small Data advantages both ends of the spectrum. A big enterprise sitting on a massive, untapped, unused well of data is nothing to brag about – letting that much data accumulate without delivering value is a waste of time, money, and effort. A Small Data approach cuts away all that fat by unleashing the usable, valuable data sitting inside your organization’s database.

Big Data gets a lot of market hype, but the truth is Big Data is slow, time-consuming, cumbersome, and only advantages certain types of big organization. It doesn’t fuel AI, it boxes it in and makes its functions and applications extremely limited. Where Big Data is slow and costly, Small Data is fast, efficient, and targeted to deliver value in real, tangible ways. If we want a world where AI can actually deliver on the promises that have made about it, we need to divorce ourselves from the narrative and the hype around Big Data. Lowering the barrier to entry for AI and unleashing its full potential means pivoting away from Big Data to making data usable through Small Data.

About the Author

Dr Lewis Z. Liu is Co-founder & CEO of Eigen Technologies. Having started his career as a consultant at McKinsey & Company in London, he then founded and led the Quantitative Finance & Strategies Division for Aleron Partners LLP, a boutique private equity advisory firm. He is also a former Senior Advisor to Linklaters LLP, where he co-founded the Tactical Opportunities Group, a deal origination team. Lewis holds a Doctorate in Atomic & Laser Physics from the University of Oxford. During his studies at Oxford, Lewis invented a new class of X-ray laser, and the mathematics behind this invention was later abstracted into Eigen’s core technology. Lewis received Harvard’s first Joint Bachelors in Fine Arts and Physics, as well as a Masters in Theoretical Physics, during which he conducted antihydrogen research at CERN.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Why the Hype Around Big Data Misses the Point – and How Small Data Gets It Right

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Speak Your Mind Cancel reply

Featured RSS Feed

More News from insideHPC

Why the Hype Around Big Data Misses the Point – and How Small Data Gets It Right

Sponsored Guest Articles

Optimizing Performance and Cost Savings for Elastic on Pure Storage

White Papers

From complexity to clarity: Harnessing the power of AI/ML and risk-informed strategies to streamline clinical data management

Join Us On Social Media

Speak Your Mind Cancel reply

Related Posts

Featured RSS Feed

More News from insideHPC