Click to learn more about author Richard Mohrmann.
Um, yes, we are still talking about reference data.
While it certainly seems like a topic that should have been resolved long ago, here we are still talking about why reference data seems like such a difficult nut to crack.
There are reasons. At the highest level, organizations usually break data down into two categories: reference data and transaction data. Reference data goes by a variety of names — master data, reference data, “static” data (which is anything but), golden copy, single source of truth (SSOT) — but they all refer to the set of information an organization needs to refer to in order to operate on a day-to-day basis. Transactional data, on the other hand, describes the ongoing events of an organization. It’s bigger, and while errors are more common than they are in reference data, they are less impactful. Because we see transactional data as it was captured during the event, errors help us improve our processes. Reference data is different.
If you ask a set of businesses what constitutes reference data, you’ll get as many descriptions as there are businesses. The concept is highly context dependent, and because of this, it is often difficult to find one set of best practices for managing these data. A complete playbook on how to manage your golden copy continues to be hard to find. The sections below describe some key areas of development to target as growing organizations pursue clean, timely, and reliable reference data.
Because the problem is hard, and because it’s hard to demonstrate best practices in any given domain (or worse yet, across multiple domains), it’s often difficult to get the business support needed to effectively manage your single source of truth. Enterprise businesses appear resigned to operating with management processes that are “good enough.” Management’s perception, backed by their experience, is that it’s too expensive to fix procedural problems, so instead they just pay the fine. While the golden copy concept is a worthy goal, in reality, it is rarely achieved in large enterprises.
The key to getting support from an organization’s leadership is good communication; and in this case, good communication means providing reliable information on the state of your master data and its impact on daily operations. As your Master Data Management processes evolve, it is important to develop and capture metrics that demonstrate quality (or lack thereof) and the effect it is having on the business. Although you’re unlikely to achieve a perfect score, tracking metrics will at least show you relative changes. Those deltas allow senior management to see how their investment in Data Management is paying off.
Even with limited business sponsorship, organizations can make progress on their Master Data Quality effort by establishing roles and responsibilities when it comes to data expertise. Data Stewardship can have a steep learning curve, and too often the responsibility falls on the wrong people. Data stewards need to be able to understand both the business and the technology well enough to be able to define objectives, establish a Data Management plan, drive data model designs, and identify business success criteria. Without direction, those responsibilities too often fall on junior operators or technologists who don’t have the business background to adequately identify and communicate requirements, so make sure your stewards are properly trained.
Successfully managing your Master Reference Data Quality goes beyond managing quality of your data sources. Sometimes the feeds are wrong. Changes to your master repository need to be staged, checked, reconciled, normalized and sometimes backed out before becoming a permanent change to your golden copy.
It often comes down to finding the right balance between timeliness, quality, and resources. If you have infinite time or resources, it’s easy to make sure your data are correct. If “good enough” is an adequate level of correctness, you require fewer resources and/or less time to make corrections. While some business lines may balk at the idea of making any decisions based on data that could be considered questionable, others find themselves in situations where a decision must be made by a hard deadline using the best information available at the time. There is no “one size fits all” here.
It all comes back to having a decent understanding of the business requirements (including timeliness), which will allow you to budget for the resources necessary to achieve a given quality level in the timeframe necessary to operate the business. It is your data steward’s role to articulate to service levels necessary to operate the business given the data and business requirements. We can borrow ideas from the software development space here: keep changes small and make sure the process of applying updates is well defined, repeatable, traceable, and reversible.
Data Quality Processes
Garbage in, garbage out is a cliché. In reality, correctness can depend on context. It is rarely possible to ensure that “bad data” doesn’t enter your master data soon after acquisition. Post-processing Data Quality reviews and correction procedures are a must, so plan for them up front. Make sure that you have a feedback process so that new business rules can be incorporated into your Data Quality logic. Confirm also that you have the ability to safely and consistently back out changes when necessary.
A Single Single Source of Truth
We talk about normalization, reconciliation, deduping, and other techniques as eliminating redundancies in your critical reference data sources. To effectively manage our reference data, we borrow another principle from software development: DRY or Don’t Repeat Yourself. It’s a great acronym, especially when you consider the WET alternative of Wasting Everyone’s Time while you’re Writing Everything Twice. While not practical or even achievable in complex systems, it’s certainly a worthwhile principal to guide your efforts as you continue to seek your own golden solution.