When Hadoop was originally conceived, it was all about sharing and as a result, security was not built in. But in today’s enterprise, Big Data represents the company’s jewels–assets that must be protected. To learn more about these issues, I caught up Jim Vogt, CEO of Zettaset.
inside Big Data: What are the main security issues with Hadoop distributions?
Jim Vogt: Hadoop, like many open source technologies such as UNIX and TCP/IP, was not created with security in mind. While the open source Hadoop community supports some security features –like Kerberos, the use of firewalls and basic HDFS permissions – these security features aren’t a mandatory requirement for a Hadoop cluster, making it possible for an organization to run entire clusters without deploying any security. At the same time, the distributed computing nature of Hadoop also presents a unique challenge – data is fluid in this type of environment, moving to and from different nodes and sometimes data is sliced into fragments and shared across multiple servers. This makes it incredibly difficult to secure with traditional security approaches. Finally, popular distributions of Hadoop – like Cloudera, Hortonworks, etc. – have little incentive to build out their security functionality. The business model for most distributions is built around the sales of professional services and support– not software. The open source distribution model also requires that any new feature developments obtain the blessing of the open source community, and this means that open source solutions will always lag behind commercial software solutions from companies, like Zettaset.
inside Big Data: To what degree is security a barrier to broader enterprise adoption of Hadoop and other big data technologies?
Jim Vogt: Enterprises are obviously moving to adopt Hadoop and Big Data technologies, but right now they’re limiting their deployment of the technology throughout their enterprise. Primarily due to complexity, management challenges and security. Any organization that stores or transacts sensitive information is going to be subject to the same compliance mandates and data security regulations that apply to their traditional data stores. This can be a gating factor for organizations looking to make broader use of Hadoop and Big Data technologies.
inside Big Data: Can’t Hadoop security be addressed with traditional perimeter security solutions and by compartmentalizing parts of the system behind firewalls?
Jim Vogt: In short, no. Perimeter security solutions and compartmentalization are not designed to address Hadoop’s unique distributed architecture. However, this does not stop incumbent data security vendors from believing firewalls are the best option for Hadoop and distributed cluster security. Some firewalls attempt to map IP to actual AD credentials, yet this requires specific network design. Even with special network configuration, a firewall can only restrict access on an IP/port basis, while knowing nothing when it comes to the Hadoop File System or Hadoop itself. In order to control access, data administrators would have to segregate sensitive data on separate servers. This approach is not only inefficient but also fundamentally incompatible with distributed file systems like Hadoop, since files are constantly being shifted from server to server. It would require the creation of a second Hadoop cluster to contain sensitive data, and even then would only provide two levels of security for the data.
inside Big Data: Does this mean that Big Data challenges can’t be met in a way that still meets with robust enterprise security requirements?
Jim Vogt: The solution is to bring the security closer to the data, and apply it within the cluster itself. This can be done using fine-grained access control such as RBAC and running it on every Hadoop node. Using commercial software to automate the installation and management of RBAC simplifies deployment, and eliminates much of the complexity that security professionals currently face with open source Hadoop products.
inside Big Data: How is Zettaset taking a different approach to security for Big Data?
Jim Vogt: Unlike the dominant BigData players in the market – Cloudera, MapR, Hortonworks – Zettaset is an enterprise software company. We sell software that enables enterprises to quickly deploy, secure and scale Hadoop clusters. Our Orchestrator software is distribution-agnostic, which means we can work with the leading Apache Hadoop-based distributions available today. We harden Hadoop to address policy enforcement, regulatory compliance, access control, and risk management within the cluster environment, delivering the security capabilities that IT security professionals expect in any enterprise.