Why Securing Big Data is a Big Deal

Print Friendly, PDF & Email

In this special guest feature, Terry Ray, Chief Product Strategist for Imperva, Inc., discusses the many reasons that make big data security a “big deal.” He also offers a short list of criteria for selecting a big data security solution. During his 14 years at Imperva, he has deployed hundreds of data security solutions to meet the requirements of customers and regulators from every industry. Terry is a frequent speaker for RSA, ISSA, OWASP, ISACA, Gartner, IANS and other professional security and audit organizations in the Americas and abroad.

Companies are increasingly collecting more information than ever making big data a “big deal” thanks to the massive volume of data, as well as the desire of many to make better use of such data.  In fact, industry research firms like IDC are predicting continued double-digit growth for big data and business analytics through 2020. If you think you don’t need to pay attention to securing big data, think again.

Often, sensitive data intentionally or inadvertently makes its way into these systems, which suggests it is important to pre-address some fundamental security and incident response questions such as:  Who has access to that data within your big data environment? Is sensitive data present within the big data environment and if you think not, can you prove it? Are the environment and the data vulnerable to cyber threats? Are there considerations that need to be made for compliance? Big data deployments are subject to the same compliance mandates (e.g., GDPR, HIPAA, PCI, and SOX) and require the same protection against breaches as traditional databases and their associated applications and infrastructure.

Same Security Requirements, Different Challenges

All the best practices for data security are still applicable for big data environments. The most critical ones are:

  • Classification
  • Access control
  • Activity monitoring
  • Alerting

The problem is how to achieve security and compliance for big data environments given the unique challenges they present.

Securing the Data Itself

Much of the challenge of securing big data is the nature of the data itself. Consider the impact on security of the well-known three v’s of big data:

  • Volume: Enormous volumes of data require security solutions built to handle them. This means incredibly scalable solutions that are, at a minimum, an order of magnitude beyond that for traditional data environments.
  • Velocity: Your security solutions must be able to keep up with big data speeds. You’ll need to focus on data parsing and collection throughput, the degree of automation that is available, and the ability to deliver real-time visibility of policy violations and other events.
  • Variety: Mixing multiple sources and types of data with different access permissions compounds classification and policy-setting challenges, elevating the need for robust audit capabilities.

Securing the Environment

It’s not necessarily the associated infrastructure and technology within big data environments that make it more challenging to secure, it’s the multiplicity that dramatically increases complexity:

  • Multiple layers: For example, the open source Hadoop framework has different layers of the stack serving a variety of purposes, from distributed storage at the bottom, to table and schema management, distributed programming, and querying/interface options at the middle tiers, and a wide range of management tools along the top. There is no single logical point of entry or resource to guard, but many different ones, each with an independent lifecycle.
  • Multiple technologies: Often big data environments will use multiple technologies for data storage and retrieval. For example, it’s not uncommon for an implementation to include either or both relational stores and query tools to support analytical workloads/purposes and non-relational technologies—also known as NoSQL technologies—for real-time, interactive workloads.
  • Multiple instances: Many big data environments include multiple instances or versions of the same core building blocks, except from different vendors, such as different Hadoop distributions and NoSQL offerings. This means a greater amount of diversity and complexity to be addressed by security tools and staff.
  • Multiple, dispersed data stores: Big data deployments typically have a multitude of geographically distributed data stores and, therefore, numerous physical nodes requiring protection. This inherently increases the potential for inconsistent security policies and practices, suggesting the need for solutions that feature strong, centralized administration capabilities.

Ensuring the People are Secure

Finally, there’s the challenge presented by the lack of security knowledge and understanding in the people working most closely with the data: data scientists and developers. Data scientists, with their skills and experience working with structured and unstructured data to deliver new insights, don’t necessarily think about the security of the data. It’s not surprising given that new technologies have encouraged data scientists to view big data as a giant sandbox where they are the owners and can decide how the data will be used.

Consider the recent attacks against online big data platforms including CouchDB, MongoDB, Hadoop and Elasticsearch where attackers stole or deleted the data from poorly configured systems.  In most cases the ‘attack’ was less of an attack and more a crime of opportunity since the thefts were primarily due to a lack of configured administrative credentials or the use of very easily guessed credentials.  Does this true scenario speak to where security configurations and best practice currently sit within many big data environments?

While most development projects rely on access to non-sensitive, test data instead of live, production data, big data application development by its nature often falls outside of the more secure processes set up within IT. And with higher-access privileges than many others in the organization, developers also present a greater security risk either through accidental means or malicious intent.

The Appropriate Security Solution for Big Data

There’s no time to waste when it comes to rethinking security for big data environments. The number and breadth of data breaches continues to grow unabated, with a 40% increase in data breaches in 2016 reported by the Identity Theft Resource Center.  Everyone from the CIO on down needs to understand and prioritize implementing better security for big data—after all, the last thing you want to hear is that there’s been a big breach in your big data.

Look for a big data security solution that lets you:

  • Classify data within the environment
  • Continuously monitor and audit all access to sensitive data
  • Uncover unauthorized access and fraudulent activity
  • Alert and respond to attacks and unauthorized activities in real time
  • Accelerate incident response and forensic investigations with advanced techniques for visualizing and analyzing detected events
  • Since most big data deployments operate within a heterogeneous database platform environment, solutions should automate reporting and compliance activities across both traditional and big data environments.


Sign up for the free insideBIGDATA newsletter.

Speak Your Mind



  1. It caught my attention when you said that there should be no time wasted and all access to sensitive data must be audited and monitored. This gives me the idea of the importance of hiring a reliable data management solution company. I could imagine the number of customers a company has, and the need for them to keep their details confidential and private.