Big Data security all too often is an afterthought when deploying solutions like Hadoop, and companies slowly are discovering that security is just as important as any other aspect of the project. In the interview below, I was able to catch up with officials at a leading big data security vendor Dataguise to talk about the company’s perspectives on this important issue.
insideBIGDATA: The big data industry is expanding in so many directions, but security doesn’t seem to be mentioned all that often. What opportunities do you see in big data security?
Dataguise: From our perspective, security for big data has not gotten the attention it deserves, for two main reasons:
- It’s an afterthought. We’re seeing more and more cases where enterprises assume that their data is protected by encryption and the compliance policies put in place for transactional systems and applications (OLTP). The presumption that the data is safe, i.e. already inside the castle, surrounded by the moat with the drawbridge up, has changed with the introduction of big data. Failed audits, data breeches, data governance concerns are forcing organizations to step back and re-evaluate their big data projects much sooner than anticipated. Instead of a gradual adoption of big data, failed audits are leading organizations to fast-track big data security. It is a problem that is not immediately discussed, from the C-level down and the IT departments up. The failed audit is the largest “uh-oh” moment for organizations that are rolling out Hadoop to mainstream users. We also see situations where data in each silo is not deemed to be sensitive, however when brought together in Hadoop (whether in an enterprise data hub, data warehouse or big data lake) the combination of that data becomes sensitive because of its association. Another challenge organizations never had to think about prior to the big data movement is the combination of structured and unstructured data. IT departments have gotten better at protecting structured data, but now that that data is being combined with unstructured data which has not been de-sensitized, it is now leading to vast troves of compromisable sensitive information.
- It’s a best kept secret. Many of our customers are innovators in protecting big data and leveraging it to their competitive advantage. Furthermore, no one wants to be the next Target, so understandably they are not going to bring attention to themselves.
At Dataguise, we see huge opportunity in big data security. We complement all the Hadoop distributions, including Cloudera, Hortonworks and MapR, filling the gaping hole in data-centric security across enterprise data sources. The silver lining of each security breach — be it Heartbleed, Target, Snowden, RSA, compromised encryption keys, etc.– is that it brings data protection to the forefront of people’s minds and is finally getting the attention that it deserves.
insideBIGDATA: Tell us a little about the Dataguise approach to big data security.
Dataguise: Dataguise takes a data-centric approach to protecting big data, whether at-rest (in the data store) or in-flight (during transfer from source to target). We enable automated discovery, masking, encryption and auditing of sensitive data in Hadoop and other data sources, using Flume, Scoop, FTP or other ETL tools. Our solutions are a great fit for organizations who have secured the perimeter, secured the wire (and wireless), secured their applications, and yet realize that it’s still not enough. They can’t just lock everything up. In order to protect against insider threats and meet compliance policies, they need to provide access and protection down to the element- or cell-level. Big Data that needs to be protected is growing at a rate of 8,000x, yet Big Data headcount is only growing at 1.5x. Automated, scalable, out-of-the-box protection is key.
insideBIGDATA: What industries do you feel would benefit the most from your products?
Dataguise: Regulated industries and organizations that have strict compliance and privacy policies, such as financial services, healthcare, insurance, retail, government.
insideBIGDATA: Can you describe a particularly compelling use case of an organization using your products?
Dataguise: Two of the top three global credit card companies use Dataguise to meet PCI compliance mandates by intelligently masking card numbers and PII data in credit card transactions in Hadoop. This involves an automated cycle of discovering the sensitive data using pre-defined policies, followed by de-identifying the data at the point of ingestion and managing which users have access to the masked vs. unmasked values. One company alone has 90 million credit card holders across 127 countries – the transactions of which generate a significant volume of data that needs to be protected.
insideBIGDATA: Give us a brief history of Dataguise and its future directions.
Dataguise: Dataguise has been in the data security business since 2007. We pioneered data masking for RDBMSs, then leveraged that expertise into discovery and protection of sensitive data in files, Sharepoint, and now Big Data and Hadoop. Our solutions are certified by Cloudera, Hortonworks and MapR, and we are expanding our partnerships with leading solutions and service providers. We will continue to provide innovative solutions for enterprise data security and Big Data protective intelligence.
Sign up for the free insideBIGDATA newsletter.