As you build metadata expertise, document definitions in your metadata repository (or whatever tool you use) and gain credibility in the enterprise, you are likely to find that your effort includes gathering, stewarding, documenting, and providing lists of valid values (sometimes called “enumeration lists”). After all, having standardization for your name suffixes, gender, marital status, relationship type, and so on ensures that new systems (and existing systems that can handle the modification) will use the same values. This leads to standardization on reports and in the data warehouse as well.
I think most would agree that items like gender code should be tracked by Data Governance (or whatever you call it) and even consider the Metadata Repository the “system of record” for such value lists. These lists have wide usage, a short list of values that don’t (or shouldn’t) change often, and no real clear owner or system of record.
But how do you draw the line between “true” valid lists of values and the vast number of data elements that happen to have a finite list of values, but should be neither governed by Data Governance nor should be documented in the Metadata Repository? Items that potentially fall into this category are things like GL Account codes, office location codes, sale rep identifiers, and even Employee Ids. And trust me, you do need to draw that line, as analysts and project team members will start asking Data Governance for this information once they understand what we do. After all, its easier than trying to dig this stuff out for themselves! The key differentiator seems to be that data elements that are created as part of a common business process with a clear business function owner should NOT be part of the Data Governance deliverables. All of these examples fall into that category. For example, HR creates (and terminates) Employee Ids as part of the hiring and termination process. They are rightfully in control of that process, and the value set changes daily (and even continuously). No one with any sense would suggest that this constantly fluctuating value set belongs in the Metadata repository, or that the Metadata repository should be the system of record. There is a clearly defined system of record — the HR system which uses these values to do the processing required for employees — such as establishing their managerial structure, setting their service date, getting them paid, tracking their taxes and withholding, disciplinary actions, change of status, location, and so on. The same can be said of the other examples noted.
Note that this doesn’t imply that the maintenance/add/removal of the values is limited to the system of record or is simple to administer. Adding a Sales Rep Id, for example, involves not only adding it to the Sales system (probably the system of record), but the HR/Compensation system, establishing the location (which can change from day to day or even hour to hour), and so on. They key, as I said, is that a common business practice with a business function owner owns this process, and the system of record is highly likely to be the main system in which the value set is adjusted initially (with propagation as necessary) and which cannot function properly without having the most up-to-date list of these values.
A key point here is that many times, true “valid values” (such as gender code) don’t have a well-defined system of record. You might make a case that Gender Code is “people data” and thus owned by the HR business function (and so the system of record should be the HR system). But what about all the people the Enterprise deals with who are not employees or contractors? Customers, suppliers, external agents, etc. “Solving” this by putting the data element into a generalized function like Customer Master (with a domain data steward) establishes ownership but does nothing for solving the issue of the SOR for these data elements. Most of the time, these values are used so generally across the enterprise that it is prudent to have an agreed-upon list documented in an easy-to-find place. The list of potential values is so small that it is reasonable and convenient to record and maintain the list in the Metadata repository, though it must be implemented identically (good luck with that) in every system which contains the data element.
To tell the truth, this whole discussion came up because of Product, and whether there should be a list of product codes supplied by Data Governance and kept for reference in the Metadata Repository. I have to admit, my initial inclination was to specify the codes and keep them in the repository, though not as the system of record. After all, we only have about 25 products, and we don’t add a lot of new ones very fast, since it takes a major effort to do that. And to reiterate, the system of record(s) has to be the product systems themselves because that is where you need to fully define the product in order for the system to work correctly (and enable you to sell the product). The fact that a bunch of other systems have to get major updates as well is more a failing of the integration and system design than anything else.
I have since changed my mind about even keeping product in the Metadata repository. To see the apparent insanity of recording the list of products in the Metadata Repository, generalize to take the example of a major retailer, such as Longs Drugs, where I worked for 7 years. Longs (now CVS) has a well-defined product hierarchy, each level is clearly specified and ALL products must fit into the hiearchy and populate under values all the way to the top. The hierarchy is a pyramid (as pretty much all hierarchies are), with just a few values at level 2, 3, and 4. However, by the time you get down to the next-to-last level (SKU), the values have ballooned to well over 100,000; and at the bottom level (UPC), the list of values are numbered in the millions. This is because every single item that Longs sold had a separate product identifier (UPC) which differentiates by package size (8 oz. vs. 12 oz. of Gelatin), flavor (grape or cherry), brand (Jello or Royal), and even type of packaging (single boxes versus six-packs/bundles). And more.
Given all, this, it is clear that the Metadata repositoryis NOT the system of record for any part of the product hierarchy (that also has not changed from my initial dialog). But does it make sense to record the products in the repository? I would now say that it does not. Basically, the list will be out of date almost immediately, because the business processes to keep it updated doesn’t exist — and really doesn’t need to.
However, just to be clear, I do think that the definition (the levels, what they are called, what they mean) of product hierarchy ITSELF (and probably any other hieararchy) SHOULD be documented in the Metadata repository. If there is no agreement on the hierarchy, then reports based on the disparate hierarchies will not match. In addition, it is a very bad idea to have different versions/definitions of what is meant by “product” (the bottom level of the hierarchy). If one group defines product as (for example) an auto insurance policy, while another breaks it down to a specific set of coverages in an auto policy, then not only will reports not work properly, but the very systems that enable working with those products will not work the same and will require people using the systems to know the idiosyncracies of something as basic as what the product is. While little can be done when working with legacy systems, new systems should be designed with a common, governed, list of products and a common hierarchy.
So, that’s it! As always, comments are much welcomed, especially from those of you who have fought this fight before.