Meeting the Demand: Containerizing Master Data Management in the Cloud

Print Friendly, PDF & Email

Perhaps no other single entity has done more to spur cloud adoption rates than Salesforce has. By democratizing Customer Resource Management (CRM) with this architecture, the ubiquitous CRM vendor considerably popularized the movement of shifting data, applications, and attendant tools like Master Data Management (MDM) offsite.

The migration of MDM solutions to the cloud is both reflective of the larger motion to avail the enterprise of this paradigm and a modest driver to do so itself, particularly with options involving Platform-as-a-Service (PaaS).

According to Profisee CTO Eric Melcher, PaaS greatly simplifies this pivotal aspect of data management by enabling the “customer to go into their preferred cloud vendor, turn it on, and it lives inside their subscription alongside all their other cloud resources. So it becomes close to everything else they’re working with from a data perspective.”

The propinquity of fully multi-domain MDM and valued sources like CRM is critical for perfecting the basics of cohering data to a common data model, governing data, and implementing fundamentals of data quality, metadata management, and data stewardship necessary to exploit this data for tangible business value.

Moreover, containerizing MDM PaaS delivers a host of other benefits necessary to truly leverage the advantages of the cloud, which are becoming increasingly sought after in contemporary times. As Melcher observed, “Certainly a lot of organizations are looking at like, why do we have an office now? Well, if we don’t have an office, why do we need a datacenter?”

Containerizing MDM via a PaaS solution enables organizations to support these emergent needs that pare operational costs while still supporting use cases for on-premise architecture, too.

Deconstructing PaaS

Utilizing an entire platform for MDM is an integral aspect of realizing its benefits in hybrid and multi-cloud settings. Whereas many SaaS offerings are simply managed services in which organizations’ data are accessed through the provider’s database, with “MDM that’s a problem,” Melcher revealed. “I’m putting my data in someone else’s database, and I don’t have the ability to query that database directly, perhaps. Or I might not be able to call that API as many times as I want in a given minute, second, or whatever throughput limits they give you.”

By accessing MDM as a competitive PaaS solution, organizations get an MDM application, a SQL Server database, and an unstructured data repository useful for attaching contracts to customer records or pictures for products, for example. Thus, when integrating with other cloud resources and certain on-premise ones, the user experience becomes “a lot more frictionless because it’s my application, my database, living in my subscription alongside my other services,” Melcher acknowledged. There also other governance and security advantages so that “it’s easier to leverage [an] authentication provider to authenticate users,” Melcher said.   


Containerizing MDM as a PaaS offering is essential to realizing the flexibility for which the cloud is renowned. Although this capability becomes redoubled with Dockers or Kubernetes orchestration platforms, containers themselves “reduce the disruption of the architecture of the platform and provide more portability and flexibility for customers,” Melcher remarked. “What a container really is is kind of a preconfigured application, if you will.” These lightweight repositories include everything to deploy apps. Without them, MDM as a native PaaS offering increases the propensity for vendor lock-in per cloud provider, and all but eliminates on-premise hybrid clouds.

The speed and ease of containerizing MDM services lets customers “spin the platform up in a matter of minutes without downloading installation and configuration guides, spinning up a Windows server, or loading up a bunch of pre-requisites,” Melcher mentioned. “All of that sort of tribal type knowledge that customers have had to historically take on when they buy an application goes away.” Containerizing MDM also offers the following boons:

  • Vertical scalability: By provisioning resources with containers, organizations can dynamically scale MDM deployments up or down using Kubernetes. “From a cost perspective this allows you to use what you need at all times, but not use more than you need,” Melcher reflected.
  • High Availability: Containers are ideal for failovers to reinforce business continuity; in the event of failure, containers can provision resources elsewhere to maximize uptime.
  • Upgrades: Although upgrades are typically time consuming efforts resulting in downtime, when using PaaS MDM with containers “your upgrade is effectively zero work,” Melcher denoted. “It’s a matter of turning off the old container and turning on the new container.”

Web-Based Stewardship, Governance

Competitive MDM offerings are predicated on a layered architecture that reinforces important data governance concepts like data stewardship. The base is a data modeling component enabling organizations to involve various sources despite inherent differences in schema. The next layer contains “logical components like my matching strategies, my data quality rules, my address verification strategies,” Melcher commented. Typically, this logical middle layer is based on business rules for MDM needs. The uppermost layer consists of applications that users can leverage to access the data for different purposes—including data stewardship.

This type of web-based stewardship is similar to interacting with a browser to access resources online (or with a MDM hub, in this case). Thus, stewards can configure “the web sites, if you will, which we call fast apps, the pages within there, and the concepts on one of those pages, and then grant access to that to the appropriate audience internally,” Melcher explained. These stewardship capabilities ensure each user’s experience is based on roles and responsibilities stipulated by governance policies.

Record Matching

Perhaps the most time-honored MDM use case is to create a golden record of any particular domain—whether for customers, products, supply chain management, or any other domain covered by multi-domain hubs. Containerized PaaS MDM supports continuous record matching, which trumps conventional search matching capabilities in several ways, the most prominent of which involves scale. According to Melcher, the former underpins “matching and maintaining datasets over 100 million records,” which becomes impractical when individually searching for records. There’s also an immediacy to continuous record matching that’s difficult to duplicate. Thus, when leveraging some of the cloud’s real-time capabilities for making data available via cloud warehouses or CRM, if “someone created a new account in Salesforce and that record gets pushed into [containerized MDM], as it arrives it’ll say this looks like another record in Salesforce and we’ll match those two records,” Melcher mentioned.

Such rapid matching is vital to extracting the best information for a golden record or merging records as needed. These capabilities are solidified by various API gateways for pipelining records to “load the data, create an audit record of what changed, and run data quality rules against it,” Melcher said, which is part of the larger matching process. The scale and rapidity of this process is attributed to in-memory techniques. The accuracy hinges on a fuzzy matching stream tokenization algorithm. Continuous record matching is essential for “uniquely identifying who are my customers, whether it’s business or consumers, [which] is one of the pretty common use cases for MDM,” Melcher indicated. “How can you claim you know your customers when you don’t even know that you’ve got five instances of one in one of your applications?” Continuous record matching eliminates this issue.

Tomorrow’s MDM

Deploying MDM in the cloud has become as natural—and as expected—as deploying CRM in the cloud. Cloud MDM’s reliance on API gateways enables organizations to accelerate the pipelining of their data into a form that’s governable, dependable, and trustworthy for any application on which to bank the enterprise. Containerizing MDM as a PaaS offering enhances the flexibility and portability of those deployments to include on-premise hybrid clouds, so organizations can maximize their data management regardless of its settings.

Still, properly configuring MDM as a containerized PaaS solution to avail organizations of the cloud security afforded by magnates like AWS or Azure requires a set of skills that may not be native to most organizations. “It’s understanding people, certificates, networking technologies…it’s just different,” Melcher admitted. “Once you figure it out it’s actually arguably easier, because a lot of the complexity’s abstracted away from you, and Microsoft or Amazon take care a lot of things inherently.”

About the Author

Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance and analytics.

Sign up for the free insideBIGDATA newsletter.

Speak Your Mind