Sign up for our newsletter and get the latest big data news and analysis.

Three Steps to Data Protection – And How They Differ for Structured vs Unstructured Data

In this special guest feature, Scott Lucas, Head of Marketing at Concentric, suggests that compliance is a complex topic, and in this article he addresses the surface of what you’ll need for your particular data and regulatory environment. Having a clear understanding of how to discover, assess and protect structured and unstructured data, and their differences, gives you the foundation you need for an effective and manageable program to protect the PII you manage. Concentric is a new startup that uses deep learning to autonomously secure millions of documents containing unstructured data in the enterprise. Concentric’s Semantic Intelligence solution eliminates the need for complex rules or unreliable user input to make unstructured data security accurate, actionable, and continuous.

Because I am both a cheapskate and own some screwdrivers, I’ve repaired my share of major appliances. Not long ago, in the midst of a more-urgent-than-usual repair, I needed a part. Abandoning Amazon in favor of more immediate satisfaction from a local appliance parts shop, I made my way to a strip-mall storefront not far from my house.

If unstructured data could be photographed, it would look like that shop: parts for every type of appliance parts haphazardly piled across shelves extending deep into the back recesses of the building, and not a part number label or barcode reader in sight. The old hand at the counter asked what I needed, disappeared into the back for a minute or two, and emerged with a shiny new match for the broken part I held in my hand.

If only unstructured data discovery was that easy. For IT teams grappling with privacy mandates, data discovery is a real problem – for both unstructured and structured data. Regulations such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) outline expectations for handling personally identifiable information (PII). Compliance and data protection are the goals, but the tactics you’ll use for millions of end-user files versus the millions of records in your databases are quite different.

Step 1: You Can’t Protect It, If You Can’t Find It

PII protection starts with PII discovery. For databases, discovery might be a one-time task to locate PII across an organization’s collection of structured data. For unstructured data, discovery is an ongoing process. Either way, discovery is a step that can’t be skipped.

It’s easy to understand why it’s hard to find PII in unstructured data. A typical organization manages more than 10M files containing everything from marketing information to customer contracts to company picnic invitations. Discovering PII in unstructured files remains one of the toughest data security challenges out there.

It’s harder to understand why structured data discovery can be tough. Structured data should provide an easy map to PII, but database designs often predate modern privacy regulations and, as a result, few databases were designed with privacy in mind. Sensitive information is often scattered across different databases, in different tables and in different fields. Sometimes PII is duplicated across tables or databases. Finding it all can be tougher than you might think.

In both cases, automated PII discovery can help IT professionals make sure they’ve found the PII data they need to protect. In the unstructured data world, rules and end-user classification programs have long been used in an attempt to identify PII – but they haven’t been effective or manageable. Recent artificial intelligence innovations show promise in automating the data-discovery task for both types of data.

Step 2: Once You’ve Found Private Data, You Have to Assess It

Understanding what’s at risk starts with a clear and complete assessment of who can access PII. Again, the differences are stark when assessing risk in structured and unstructured data. Here are some things to keep in mind when evaluating the “who and how” of PII access in a structured database.

  • Large-scale databases supporting web applications – such as those supporting ecommerce operations – typically connect those applications to data using a handful of service accounts. Tracing who has access isn’t usually a problem.
  • Increasingly, API connections to databases extend access, sometimes outside the organization itself. It goes without saying that these connections need careful oversight.
  • PII can “escape” from the structured to the unstructured world when users create reports containing data from a database. This is an often-overlooked avenue of data exposure.

Assessing unstructured data for risk is far more difficult. Fortunately, if you’ve successfully discovered which documents contain PII, risk assessment is more manageable. Once you know where PII is, you’ll want to look for the following indicators of risk:

  • Inappropriate sharing with external or personal emails
  • Link sharing, especially unprotected or non-expiring links
  • Files stored outside of designated locations
  • Unclassified files that may slip by data loss prevention services

This can be a daunting task. Again, recent innovations in AI can lend a huge helping hand to your team as they establish access control for your end users’ files.

Step 3: Once You’ve Assessed Private Data, It’s Time to Protect it

As with the tasks of discovery and assessment, tactics for protecting structured and unstructured data are quite different. Here’s some advice for structured data risk mitigation:

  • Refactor your database to eliminate duplication, clarify data structure and make PII discovery easier for whoever has to do the job once you’re gone.
  • Tokenize and/or encrypt sensitive fields to add an extra layer of security on top of your access control best practices.
  • Delete what you don’t need. A major PII spill of unneeded years-old data is, to be blunt, an unforced error. Don’t be that guy.
  • Explore emerging technologies for API security and granular database access control. Most service accounts currently have very broad access and poor API design or implementation can be a weak link. See what you can do to tighten things up.

And on the unstructured side of things, there are emerging tactics to consider as well:

  • Strive for least-privileges access control at the file level for all business-critical data. Folder-level security isn’t good enough.
  • Continuously monitor the situation. Users create thousands of new files each year and a one-time audit is not going to cut it.
  • Look for ways to enlist your entire security stack in the PII risk management effort. For example, you can now autonomously assess risk and automatically tag files as sensitive. Those tags help data loss prevention solutions do a faster, more accurate job.
  • Be careful about how you communicate the situation. Flooding your end users with security bulletins will create alert fatigue and defeat the purpose. You need high fidelity, actionable information.

Wrapping Up: Meeting Compliance Mandates

Compliance is a complex topic, and this article just scratches the surface of what you’ll need for your particular data and regulatory environment. Having a clear understanding of how to discover, assess and protect structured and unstructured data, and their differences, gives you the foundation you need for an effective and manageable program to protect the PII you manage.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Leave a Comment

*

Resource Links: