The Big Data Era: Managing Challenges of Scale, Speed, Personal Information

Print Friendly, PDF & Email

Worldwide, 2.5 quintillion bytes of data are created every day, and with the expansion of the Internet of Things (IoT), that pace is increasing. 90 percent of the current data in the world was generated in the last two years alone. When it comes to businesses, for a forward thinking, digitally transforming organization, you’re going to be dealing with data. A lot of data. Big data.

While simply collecting lots of data presents comparatively few problems, most businesses run into two significant roadblocks in its use: extracting value and ensuring responsible handling of data to the standard required by data privacy legislation like GDPR. What most people don’t appreciate is the sheer size and complexity of the data sets that organizations have to store and the related IT effort, requiring teams of people working on processes to ensure that others can access the right data in the right way, when they need it, to drive essential business functions. All while ensuring personal information is treated appropriately.

The problem comes when you’ve got multiple teams around the world, all running to different beats, without synchronizing. It’s a bit like different teams of home builders, starting work independently, from different corners of a new house. If they have all got their own methods and bricks, then by the time they meet in the middle, their efforts won’t match up. It’s the same in the world of IT. If one team is successful, then all teams should be able to learn those lessons of best practice. Meanwhile, siloed behavior can become “free form development” where developers write code to suit a specific problem that their department is facing, without reference to similar or diverse problems that other departments may be experiencing.

In addition, often there simply aren’t enough builders going around to get these data projects turned around quickly, which can be a problem in the face of heightening business demand. In the scramble to get things done at the pace of modern business, at the very least there will be some duplication of effort, but there’s also a high chance of confusion and the foundations for future data storage and analysis won’t be firm. Creating a unified, standard approach to data processing is critical – as is finding a way to implement it with the lowest possible level of resource, at the fastest possible speeds.

One of the ways businesses can organize data to meet both the needs for standardization and flexibility is in a Data Vault environment. This data warehousing methodology is designed to bring together information from multiple different teams and systems into a centralized repository, providing a bedrock of information that teams can use to make decisions – it includes all of the data, all of the time, ensuring that no information is missed out of the process.

However, while a Data Vault design is a good architect’s drawing, it alone won’t get the whole house built. Developers can still code and build it manually over time but given its complexity they certainly cannot do this quickly, and potentially may not be able to do it in a way that can stand up to the scrutiny of data protection regulations like GDPR. Building a Data Vault environment by hand, even using standard templates, can be incredibly laborious and potentially error prone.

This is where data vault automation comes in, taking care of the 90 percent or so of an organization’s data infrastructure that fits standardized templates and the stringent requirements that the Data Vault 2.0 methodology demands. Data vault automation can lay out the core landscape of a Data Vault, as well as make use of reliable, consistent metadata to ensure information, including personal information, can be monitored both at its source and over time as records are changed.

About the Author

Dan Linstedt, is the inventor of Data Vault modeling, and a renowned expert in data warehousing and BI implementation. Linstedt has been working in IT for more than 25 years, and within data warehousing/BI since 1990.


Sign up for the free insideBIGDATA newsletter.


Speak Your Mind