The Rise of Synthetic Data to Help Developers Create and Train AI Algorithms Quickly and Affordably

Print Friendly, PDF & Email

When Facebook unveiled its plan last week to open two new AI labs and create an AI safety net for its users, the company also announced – for the first time – that it will use a new technology to protect more than 2 billion users of the social network synthetic data.

Data scientist Sergey Nikolenko hailed the announcement from Facebook and hopes it leads to the mainstream adoption of synthetic data as a powerful tool that helps developers create and train AI algorithms quickly and affordably without compromising privacy.  Nikolenko’s company, Neuromation, a leading provider of synthetic data, was featured last week in WIRED magazine.

While fake news has caused problems for Facebook, fake data will help fix those problems,” said Nikolenko.  “In a computing powerhouse like Facebook, where reams of data are generated every day, you want a solution in place that will help you quickly train different AI algorithms to perform different tasks, even if all the training data is.  That’s where synthetic data gets the job done!”

While much of the news coverage about Facebook’s announcement and the AI industry in general have focused on algorithms and the tasks they’ll perform, data scientists like Nikolenko agree a bigger story is the growing shortage of high-quality, task-specific data needed to train AI algorithms.  To meet this demand, companies like Neuromation have entered the marketplace with synthetic data, which mathematically recreates or mimics real data so the algorithms AI-powered computers can finish their training.

How will Facebook use AI and synthetic data?  While the company’s long-term goal is to leverage AI to improve its various networking tools and apps, its immediate goal is to fight fake news, online harassment, and political propaganda from foreign governments, like the campaign mounted by Russia during the 2016 election.

One particularly fascinating goal revealed at last week’s developers conference is Facebook’s plan to use synthetic data to do more than train algorithms how to detect bullying language on its platform.  Using a process called adversarial training, the algorithms will also learn how to generate online insults.   While Facebook will presumably keep outbound insults offline, by using synthetic data and adversarial training, algorithms will learn faster and, over time, detect a broader range or insults.

While synthetic data will be used by Facebook to fight online harassment while also protecting user privacy, it’s also being used by healthcare companies to help doctors gain carry out medical research while ensuring patient confidentiality.  The technology has also been used in political research, recently giving data scientists at the Center for American Progress in Washington D.C. insight into shifting demographics and voting patterns likely to affect the Presidential election in 2020.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind



  1. Who is the actual author of this article? I’d love to know. Thanks!