How Organizations Can Avoid Data Bias in the Age of AI

Print Friendly, PDF & Email

Artificial intelligence is an increasingly prominent part of our lives, in areas you may not even think about. 

Chances are you’ve had a travel problem in the last year or two, caused by the many disruptions the COVID pandemic has wrought on the industry. When you messaged your airline’s Facebook page, did you encounter a bot? That’s artificial intelligence at work. 

I bet your school-age children ask your smart speaker at home 1,000,000 questions per day, or ask your respective brand’s speaker to play 46,789 songs per day. Hello, AI! 

I bet many of you reading this have applied for a job during the pandemic, when the job market has very much favored job seekers. That online application tool? Powered by artificial intelligence, which compares application content to key words identified and sought by hiring managers to initially weed out unqualified candidates. 

Bottom line: AI is only growing as a part of our lives – in a recent PwC survey, more than half of respondents accelerated AI efforts due to COVID, with nearly 90 percent indicating they view AI as a mainstream technology. Similarly, an IDC report shows that AI system spending will grow by 140 percent by 2025 – on top of the already massive amount of growth the technology already has experienced. 

With that trend comes danger if the technology is not built properly with the appropriate safeguards in place to avoid data bias. How to do it? A few simple steps can make the difference between a useful, fair data model and one that introduces, consciously or subconsciously, bias. 

Ensure checks and balances are present: The need for neutrality from those humans building AI models is clear, and those involved with that process take pains to ensure that neutrality. 

The fact is, though, that no matter how neutral humans attempt to be in setting parameters and filtering and curating data, biases can come into play. 

Those models rely on massive amounts of data, and it’s imperative that technologists abide by parameters that are in place when building those algorithms, to avoid introducing bias as much as possible. Those humans are involved every step of the way: They create the models, feed data into them, train them to interpret the ensuing data – all steps in which the information being used may be unwittingly influenced by those technologists’ beliefs, backgrounds or other environmental factors. 

So: How to avoid? 

In those models where humans have a strong role in collecting and interpreting data, it’s key to ensure those humans have received some training in bias. Additionally, using the right training data will help ensure success: Training data can and should most often replicate real-world scenarios with proper demographic representation, without introducing human predispositions. 

Machine learning is only as good as its training data – consider, for example, college applications: If, for example, those training data are not reflective of real-world dynamics, bias may result in unequal results in terms of acceptance or matriculation. Monitoring models to ensure they reflect real-world performance and can subsequently be tweaked is biases are detected is also important. 

Use timely, relevant data: The examples mentioned earlier are, largely, trivial in the grand scheme of things – travel problems, smart speakers, Facebook ad targeting and other examples are most evident to the average consumer, but they aren’t life and death. 

When it comes to healthcare providers and government officials that are developing machine learning models that impact daily life, though, things get a little more serious.

In all instances but even more importantly in those more serious scenarios, model testing is absolutely crucial; it could help you avoid costly mistakes with life-changing impacts, such as one example from the UK in 2020 when thousands of COVID case records were excluded from modeling data, or Zillow’s widely publicized (and mocked) algorithm error that led to miscalculations in home purchase prices and subsequent mass layoffs at the housing giant. 

Add in the rapidly changing environment we all live in these days, and timely data are key: Relevant data strongly impact the ethics of AI models and practices, and eliminate bias. As anyone reading this knows, what was relevant a year ago, a month ago or a week ago may not still be relevant to an AI model. 

A real-life example: Anyone building a post-COVID model to identify warning signs for mental health concerns would be all-advised to use data from 2019 to create and train the model. Those warning signs and leading indicators have changed dramatically since the onset of COVID; the world is different, and so too are factors impacting mental health and myriad other areas reliant on and incorporating AI. 

In summary: If data are old or irrelevant, the desired outcome will not be achieved and may be useless or ineffective at best and life-altering at worst. Keeping in mind some of this guidance will eliminate bias in AI models, improve data relevance, increase transparency, bolster trust and ultimately lead you on the path to more ethical AI.

About the Author

Ken Payne is Hyland’s Product Manager for Automation. Ken leverages his more than 20 years of industry experience to drive Hyland’s product vision and strategy, helping customers meet their automation and digital transformation goals.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW

Speak Your Mind

*