Sign up for our newsletter and get the latest big data news and analysis.

Is Data Vault Modeling a Good Choice for Your Organization?

In this special guest feature, Moshe Kranc, CTO at Ness Digital Engineering, discusses data vault modeling and the key benefit of data vault architecture is that it has a design that is flexible and adaptable to meet the changing needs of an enterprise. Moshe has extensive experience in leading adoption of bleeding edge technologies, having worked for large companies as well as entrepreneurial start-ups. Moshe previously headed the Big Data Centre of Excellence at Barclays’ Israel Development Centre (IDEC). Moshe has worked in the high tech industry for over 30 years in the United States and Israel. He was part of the Emmy award-winning team that designed the scrambling system for DIRECTV, and he holds 6 patents in areas related to pay television, computer security and text mining. He is a graduate of Brandeis University and earned graduate degrees from both the University of California at Berkeley and Boston University.

As business environments get increasingly volatile, enterprise data architectures must be flexible and adaptable to fast changing market conditions. The common challenge with dimensional and normalized data modeling techniques is that they aren’t designed to respond to fast changes. Data Vault modeling helps properly address this challenge.

Data Vault architecture is an innovative, hybrid approach that combines the best of 3rd Normal Form (3NF) and dimension modeling. This data modeling technique enables historical storage of data, integration of data from different operational systems, and tracing of the origin of all the data coming into the database. The Data Vault approach is based on the concept of Hubs, Links and Satellites:

  • Hubs represent business keys that are uniquely identified, with a very low tendency to change.
  • Links represent relationships (transactions, hierarchies or associations) between hubs. A single link is created for each type of relationship, e.g., a single link record indicates that there is a relationship between customers and products called Purchases.
  • Satellites represent attributes of hubs or links, consisting of data that tends to change over time. A single satellite record is created for each instance of a relationship, e.g., a single satellite record is created for each instance of a customer purchasing a product, defining attributes of that purchase such as date, quantity and price.

The key benefit of Data Vault architecture is that it has a design that is flexible and adaptable to meet the changing needs of an enterprise. With traditional data models, it could take BI teams months to add new relationships into the data warehouse, because those relationships are built into the schema of the warehouse. Hence, any change requires extensive governance and testing. The Data Vault approach makes enterprise data warehouses more agile, because the relationships are not part of the schema – they are just data rows in the Links and Satellite tables. This enables rapid implementation of evolving data relationships.

So, is Data Vault modeling a good choice for your organization? That depends on your environment and your specific use case. Here are some of the major benefits and limitations that will help you decide if a Data Vault would meet your specific data architecture needs.

Benefits include:

  • Increased usability by business users as a Data Vault is modeled after the business domain
  • High performance
    • Data Vault supports near-real-time loads as well as batch loads
    • Terabytes to petabytes of information (Big data)
    • Decoupling of key distribution enables a very high degree of parallelism, due to a reduction of ETL (Extract, Transform and Load) dependencies
  • Historical traceability
  • Supports isolated, flexible and incremental development (organic growth).
    • Dynamic model can be incrementally built, easily extended
    • No rework is required when adding additional information to the core data warehouse model
    • Supports business rule changes with ease

Limitations include:

  • Data Vault requires a lot of JOIN’s to derive data marts
    • Bridge tables can help
  • Like 3NF, Data Vault is impractical for direct querying
    • Query from a derived data mart
  • De-normalization means more storage is required
    • Better use cheap storage

Data Vault modeling is a robust and mature data architecture that can provide real value to an organization when used for the right use case, but it requires considerable expertise. If you are building a data warehouse, seek help from a trusted partner to evaluate whether Data Vault modeling is appropriate for your use case, and to guide you through optimal design of the data model.

 

Sign up for the free insideBIGDATA newsletter.

Leave a Comment

*

Resource Links: