by Sunil Soares
A business glossary is a repository that brings together common definitions of key terms across business and IT. The most ubiquitous business glossary tool is Microsoft Excel. Indeed, many organizations start their data governance journey by documenting key business terms in spreadsheets that they then load to Microsoft SharePoint. However, this starts to get unwieldy when the size of the glossary extends into the hundreds of business terms. At this point,organizations start to look at business glossary software tools.
There are a number of business glossary tools that are part of broader data governance suites from different vendors. These offerings including Adaptive Business Glossary Manager, Collibra Business Semantics Glossary, IBM InfoSphere Business Glossary, Informatica Business Glossary, SAS Business Data Network and the Business Glossary feature of Sybase PowerDesigner. I will not provide an evaluation of each vendor’s offering. Rather, I will propose a set of selection criteria when selecting a business glossary tool.
Business glossaries are meant to be used by business users. It goes without saying that business users will not use the glossary if it is hard to create, edit and delete business terms. There are several ease-of-use features but one is particularly interesting. Consider that you have two Dodd Frank-related terms in banking:
As you will note, the second term is embedded in the definition of the first term. As a result, the two terms need to be linked. This can be done manually, but can become a real challenge when dealing with thousands of business terms in the glossary. Any tool that can auto-link these terms will offer tremendous advantages in terms of productivity of the data stewards.
Cost is another important factor to consider, especially for entry-level data governance programs. Obviously, the marginal cost of Microsoft Excel is zero for many organizations, which is why it is the tool of choice for many business glossary implementations. In addition, several data governance tool suite vendors will offer their business glossaries at a low price to provide an end-to-end solution.
I don’t see a lot of software vendors supporting business glossary tools in the cloud.However, I actually believe that business glossaries are tailor-made for the cloud because they contain a limited amount of sensitive data such as Personally Identifiable Information (PII) and Protected Health Information (PHI).
Business glossaries should be part of a broader metadata initiative that supports data lineage and impact analysis. Metadata architects should be able to easily link business terms to the associated technical metadata. So a term called “customer number” should be linked to CUST_NUM. This means that the business glossary is supported by a metadata repository that can ingest metadata from a heterogeneous environment consisting of data modeling tools, ETL jobs, business intelligence reports and other technical artifacts.
Business glossaries are rapidly emerging as the repository for data governance policies and rules in addition to business terms. The business glossary tool should also act as a repository for data policies and rules. The tool should let you link those artifacts to the associated business terms. For example, you should be able to create term called “minor” and link to a rule that states that “a minor must have a guardian.”
It is one thing to create a business glossary, and quite another thing to ensure that it is actually leveraged by business users. Many business glossary tools offer a desktop widget that links the business glossary to the report or application. For example, you can highlight a term in Cognos or MicroStrategy, and then Shift+F5 to pull up the definition from the business glossary. This is a great selling feature with business users because they have definitions at their fingertips.
A business glossary should allow administrators to assign business terms, categories of terms, data policies and data rules to stewards. This enables a federated approach to data governance by allowing the business to own and manage key business terms. The business glossary tool should provide support for a data stewardship dashboard. The dashboard provides visibility in terms of the number of business terms assigned to each data steward as well as the number that have been approved or are still in the pending approval state.
Business glossary tools should support simple or complex workflows to allow multiple parties to participate in the creation of a term. For example, the workflow for the term “net sales” in retail might involve multiple approvals by marketing, merchandising and finance. These workflows should be fully configurable. The data stewards will instantly become more productive as the software tool will shield them from routine tasks and dealing with multiple email trails.
Business glossary tools should allow the creation of custom attributes. For example, a data governance team might want to create a customer attribute called “Sensitive (Y/N)” to flag certain attributes such as Social Security Number in the U.S.
Business glossary tools should also provide a mechanism to link a business term to the associated reference data. For example, you might want to link a business term called “industry classification” with the list of allowable values for North American Industry Classification System (NAICS) codes from the U.S. Census Bureau.
A business glossary tool should support the linkage of business terms and rules to the actual data rules in the data quality tool. For example, the business glossary may include the definition for the term “employee identifier.” It may also include a data rule that “employee identifier is a six digit alphanumeric code that is unique across employees.” Both the business term and the data rule are assigned to a human resources data steward in the business glossary tool. The data quality tool will then implement a set of data rules to identify exceptions to the rules in the business glossary. The metadata architect should be able to link the business term and data rule in the business glossary with the technical implementation of the data rule in the data quality tool. This provides end-to-end governance over critical terms and policies from the business to the technical implementation.
Many MDM hubs have hundreds of data rules that become opaque over time. For example, an MDM hub may implement a data rule that “source address should contain at least one address line and either a postal code or a city.” The data steward should also document this data rule in the business glossary. The metadata architect should then link the data rules in the glossary and the MDM hub to provide end-to-end governance from the business to the technical implementation.
Many organizations are positioning the business glossary as the centerpiece of their data governance programs. Every data governance program requires people and process. However, a robust tooling infrastructure will make business and IT stewards more productive. Hopefully, these thoughts will help. Comments appreciated!