How Can Big Data and AI help to Tackle Fake News and Mis(Dis)information

Print Friendly, PDF & Email

Fake news and disinformation have become a global threat for information integrity and are driving distrust towards individuals, communities and governments worldwide. We are overwhelmed with disinformation on a daily basis through news reports, images, videos, and memes. 

Twisting facts to further an agenda is not a new problem. However, the explosive growth of social media, combined with the emerging power of artificial intelligence to generate content, has added new dimensions to the problem and greatly magnified it, resulting in the current “fake news” epidemic and information crisis.

It’s clear that human fact checkers working by themselves cannot keep pace with the sheer volume of misinformation being shared every day. Many have therefore turned to advanced artificial intelligence for effective solutions to combat problematic content at-scale, but this is not without its own challenges. 

Linguistic cues such as word patterns, syntax constructs, and readability features need to be modeled to reliably discriminate between human and machine-generated content. State-of-the-art natural language processing (NLP) techniques are required for representing words and documents to effectively capture the contextual meaning of words.

Furthermore, knowledge graphs and advanced graph NLP algorithms are required to better model the interplay between different aspects of a textual content and also represent the underlying themes in the document onto higher level abstractions. 

In the case of visual content, advances in photo editing and video manipulation tools have made it significantly easier to create fake imagery and videos. However, automatic identification of manipulated visual content at scale is challenging and computationally expensive. It requires cutting edge compute infrastructure and implementation of state-of-the-art computer vision, speech recognition and multimedia analysis to comprehensively model the visual artifacts at various levels to understand numerous aspects such as pixel and region level inconsistencies, plagiarism, splicing, and spectrogram analytics.

In addition, the popularity of generative adversarial networks (GANs), and the high accessibility of tools that implement them have accelerated efforts to significantly generate deceptive multimedia that mimics the verbal and physiological actions of individuals. 

Countering deceptive multimedia generation and spread requires advanced AI models that are effective at synthetic multimedia detection as well as generation. The self learning side of this type of AI, through consistent re-training, requires massive scale multimedia and cutting edge compute power to improve the automated solutions for visual content understanding and verification.

However, important recent advances have been made which can alleviate some of these challenges.

Advances in big data processing and sampling offer clever and reliable ways to extract smaller, yet representative data samples that encompass all the critical patterns and signals required for the AI to extract powerful insights, but with far reduced computational demands. 

Model compression and knowledge distillation strategies have shown that the AI model complexity, size and inference costs can also be significantly reduced whilst retaining the same level of accuracy as the original model. 

These breakthroughs, along with machine learning techniques such as few-shot learning, have massively reduced the compute engine costs on cloud infrastructures, thereby making AI based big data analytics affordable for solving real world problems such as misinformation. 

Furthermore, it is now possible to build and maintain advanced ensemble AI systems that ingest and process endless amounts of data streams to extract actionable insights concerning information source authenticity, content credibility, information veracity, social network analysis of disinformation, its reach and the influencers behind it. 

However, AI on its own can only do so much. The most accurate AI models can only be maintained through reinforcement and training by human intelligence and expertise. And while AI is reliable to extract advanced insights about misinformation, it needs to be paired with human analysts and domain experts -human-in-the-loop-AI – to transform the insights to be highly interpretable and actionable. 

In addition, mitigating the risks and damages caused by the viral spread of mis and disinformation requires enforcement of timely, proactive countermeasures such as dissemination of credible, verified information and analytical reporting into different aspects of a mis/disinformation narrative e.g. key actors and campaign origins. This is only possible with extended (human + AI) intelligence that can optimally harness the power of big data, human-in-the-loop-AI, and advanced computing. 

Humans and AI are both equally responsible for the problem of misinformation. To solve it, we need to change human behavior to suit our new roles as big-information consumers. We need to realize the importance of information authenticity along with our information needs. This is a gradual process but until then, AI can lessen the risks and act as a catalyst for change.

About the Author

Dr. Anil Bandhakavi is Head of Data Science for Logically. Anil has 10 years of experience in the field of Artificial Intelligence including a Ph.D. in NLP. He joined Logically in 2018 to lead the development of effective AI solutions that enable the tools and products of Logically. He believes strongly in extended (artificial + human) intelligence to create impactful and interpretable solutions to societal problems.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Speak Your Mind

*