GDPR Challenge: Finding the Data That Needs to be Forgotten

Print Friendly, PDF & Email

In this special guest feature, Amnon Drori, Co-founder and CEO of Octopai, discusses the GDPR challenge: finding the data that needs to be forgotten. Amnon has over 20 years of leadership experience in technology companies. Before co-founding Octopai he led sales efforts at companies like Panaya (Acquired by Infosys), Zend Technologies (Acquired by Rogue Wave Software), ModusNovo and Alvarion, and also served as the Chief Revenue Officer at CoolaData, a big data behavioral analytics platform. Amnon studied Management and Computer Science at the Open University of Tel Aviv.

Whether they’re ready or not, companies around the world have a new data challenge – one that they must succeed in meeting, if they don’t want to lose huge amounts of money in fines and penalties. Among its many rules, the GDPR, Europe’s new data privacy and security regime, requires that companies delete personal information on European residents within 48 hours of being asked to do so – providing Europeans with the “right to be forgotten,” and failure to do so could cost a company a lot of money.

The question for many organizations isn’t just “what system do we have in place to remove user data;” it’s “where do we find the data we need to remove?” Over the years, with the implementation of new databases, new data recording regimes, new administrator policies, and new marketing programs, personal data on users is likely stored in many locations – on servers, in backups, on social media channels, and more.

Even worse, the metadata information on the same data may vary, depending on the way data was stored and structured in a database or backup. A simple search isn’t going to find everything – and if companies can’t prove they can find everything, they may find themselves penalized. Considering that a typical organization likely has billions of pieces of data, finding and isolating a specific piece of data is going to be a major challenge – far too big and complicated for even a whole IT team. The only way to do this accurately is to automate the process, implementing a smart system that can quickly parse even vast amounts of data and track down the specific data required.

GDPR regulations went into effect in May, but as of just a few weeks ago, some 80% of large companies worldwide surveyed – and nearly 90% of those surveyed in the U.S. – were not yet GDPR compliant. While the EU seems to be taking a tolerant approach to companies that are not yet compliant, given that transition has proven to be a challenge for many firms, eventually full compliance will be expected, and full enforcement will be imposed.

Among the important goals of the GDPR is to give European Union residents control over “their” data – the personal information companies have collected about them. The EU guidelines on what is expected from firms are clear; in order to comply, companies need to be able to track down all the data they have on EU residents and have it readily available for processing.

Seems simple enough, but actually it isn’t. The challenge of tracking down data, determining its provenance, and ensuring that it has been eliminated throughout the data chain, is proving to be a very difficult task for many companies. Personally identifiable information (PII), which GDPR requires be deleted on demand, can be found in databases, backups, removable media that is in storage, employee devices, etc. Some of the data could be duplicated or propagated down the line, and be stored in dozens of places.

Locating data in an organization generally falls to the business intelligence (BI) team, which typically maps out the structure of data in an organization, and traces it through the various BI systems in place. In order to track down specific PII, the BI team must find an occurrence of the data (for example, an individual’s e-mail address) and trace its flow through the organization’s data storage areas.

In order to get GDPR-ready, companies have been (or should have been) performing this activity for all data elements GDPR would require organizations to track. This is a monumental task, and likely an important reason why companies report that they are not GDPR-compliant.

Given the stakes and the danger, organizations really can’t take a chance that their BI team will be able to process all the data in time. Instead, what they need is an automated system that will find the data for them. A smart automated BI detection system will parse through all the data in an organization’s system and determine the location of data, and find where it was propagated to. The smart automated system categorizes data according to its type, indexing its location so that it can easily be found, and determining the dependencies and relationships of that data so that all other data associated with it can be deleted quickly and accurately.

Thus when a request comes in from an EU resident that their e-mail or other PII be erased from an organization’s system – and EU enforcers inquire months later on whether that request was fulfilled – organizations will be able to claim that they carried out their obligations under the GDPR, and prove their compliance.’

For many organizations, GDPR may be the biggest data challenge they have ever faced – but it also provides organizations with an opportunity to truly own their data. By implementing a smart system that will ensure that they are able to find the data in their systems at will, organizations will ensure that they are GDPR-compliant – and have the opportunity to utilize all their data to help their organizations run more efficiently and profitably.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind



  1. Richard Moore says

    If I understand the law correctly, and I am not a lawyer so please check appropriately, there are also heavy fines for deleting a customer’s data if it is later needed for a criminal investigation. So to fully meet the needs of the GDPR, you must find the data requested to be deleted, make it unavailable to any and all corporate functional abilities while also sequestering it away so that if it is later needed by government agencies it can be recovered.

    I very much welcome responses to this comment from legal experts who can either confirm or alter this interpretation.