Data without Borders: How Synthesis Enables Ethical Sharing

Print Friendly, PDF & Email

In this special guest feature, Mike Capps, CEO of Diveplane, discusses why organizations should consider synthesizing their data, what that process entails, what the benefits and use cases of synthesized data are and why he believes synthesized data is the key to not only a successful data/AI project, but also to a successful data-driven future. Before co-founding Diveplane, Mike had a legendary career in the video game industry as president of Epic Games, makers of blockbusters Fortnite and Gears of War. His tenure included a hundred game-of-the-year awards, dozens of conference keynotes, a lifetime achievement award, and a successful free-speech defense of video games in the U.S. Supreme Court. He’s survived trending on Twitter, iPhone launch events with Steve Jobs, and raising two toddlers. Michael began his career with post-graduate degrees at UNC, MIT, and the Naval Postgraduate School; for his research in VR, he was featured in SIGGRAPH’s historical documentary on computer graphics. He remains a regular host of multiple television series on the Discovery and Science Channels.

An incredible amount of data is being collected by the second, and with it comes a world of opportunity for digital transformation and exploration – from critical medical research to streamlining monotonous administrative work. However, the laws and limitations that exist around data sharing can be incredibly restrictive and act as a major roadblock, preventing companies from achieving peak efficiency and growing towards breaking new ground. Violating these rules can be devastating, often resulting in huge fines and even lawsuits. These concerns keep companies from using their own data to train machine learning systems and performing analyses that could evolve business models and expand their capabilities. GDPR in the UK and regulations within the United States (such as HIPAA and the California Consumer Privacy Act), although necessary to maintain the ethical management of personal information, hinder organizations from moving their own data – even across departments, making full analysis virtually impossible.

Fortunately, there is a solution to this ethical dilemma. Data synthesis, the process by which data containing sensitive or private information is consumed by a machine learning algorithm and run through an anonymous path, creates an entirely new data set with similar patterns and statistical properties. This isn’t just simple de-identification, or removal of private fields.” The resulting data contains no real personal information, eliminating ethical concerns entirely. Once the data becomes truly anonymous, all the risks associated with data sharing – especially in sensitive industries such as healthcare, banking and the military – are no longer an issue.

Let’s dig deeper and look specifically at the healthcare industry. Data synthesis has the potential to completely revolutionize the way that healthcare organizations interact with the information they’re collecting on a near constant basis. Organizations are often barred from sharing patient data, even between departments. Consider a hospital that wants to share data with a university for research purposes – privacy and protection would likely be an enormous barrier. Providing data from a synthetic patient population, however, could allow for a more streamlined use of resources – from optimizing insurance underwriting, automating billing procedures, or just increasing the efficiency of standard diagnostic medicine. Once the synthesis process becomes more widely utilized, the healthcare field can expect to see an influx of new possibilities for simplifying administrative tasks, new research and medical breakthroughs.

While it may not be quite the life or death situation of healthcare, banking is another industry that is heavily restricted by data privacy regulations. Banks learn even more about their customers with each swipe of a credit or debit card. By tracking spending, banks gather insights into trends and patterns that could train algorithms to identify abnormalities and detect fraudulent activity more quickly. For example, Capital One’s Eno program personalizes the user experience by analyzing customers’ spending patterns to pick up on fraud as soon as it occurs. However, tricky financial record laws, such as the Gramm-Leach-Bliley Act or Germany’s BDSG Act, prevent financial institutions from moving data around, even internally – and limit technological advances in detecting suspicious activity. When it comes to banking data, it’s generally impossible to determine which belongs to high-powered, wealthier individuals, who could be at a higher risk or being targeted by cybercriminals if a breach or leak were to occur – so all data must be handled extra carefully, with special gloves, just to be safe. The technology that could detect internal and external threats of fraud exists, it just needs to be fed the necessary information that allows it to learn. With access to synthetic data that accurately replicates the statistical components of real customers’ data, banks will be able to crack down on crime and more effectively protect their customers’ finances.

A particularly critical industry for data synthesis, however, is the military. The defense space works closely with highly classified telemetry data, which has extraordinary potential for training AI algorithms. The sensitive nature of the data currently prevents it from being fully utilized – but synthesizing would unlock a greater scope of opportunity for AI in military strategy. Utilizing synthetic data in building defense strategy – consisting of everything from training to infrastructure management, is a visionary answer to potential security concerns.

Synthesized data, in addition to providing opportunities for growth, also eliminates a company’s vulnerability to hacking. When data is provided to a third party, the main concern has always been toeing lines and being careful to avoid breaking laws or violating restrictive policies. Breaches are too often a second thought, although if one were to occur through a third party, the data owner is still liable. As we’ve seen just this year with third-party breaches that have affected Ascension, Humana and Quest Diagnostics, leaking personal information can be devastating. A fresh set of synthesized data that contains no real, identifiable records, has the potential to make breach concerns a thing of the past.

Data synthesis techniques are still in their earliest stages, so the next step is fine-tuning them for use on a larger scale. In military terms, synthesis process must be “battle-hardened” before it can be deployed. Synthetic data provides the flexibility and room for growth that is currently stifled by privacy laws, as well as a peace of mind in knowing that personal information is safe and secure.

Sign up for the free insideBIGDATA newsletter.

Speak Your Mind