The best way to add Data Governance to Project Methodology is to build the deliverables right into the methodology. Many companies, including AAA (where I work) have a project methodology tool that details the tasks, milestones, and artifacts necessary to perform various types of projects. By specifying these items and tying them to the Work Breakdown Structure (WBS), a project manager can see what is needed to do data governance and at what stage of the project the tasks must be performed.
Here is what I added to the project methodology:
- Data Governance Project evaluation form: This form is filled out by the project manager and reviewed with the Chief Data Steward. Its purpose is to evaluate how much and what kinds of data the project will be using. For example, it may be that the project is purely infrastructure, and doesn’t have any data governance component. At the other end of the spectrum, you might be building or adding to a data warehouse, working with massive amounts of data, including some that may not exist elsewhere in the enterprise. This form enables the Data Governance function to evaluate the project and determine what level of involvement will be needed to meet enterprise requirements for metadata and data quality.
- Definition and Derivation collection spreadsheet. This is filled out by the Metadata Analyst assigned to the project. It is used to collect definitions and derivations during requirements and analysis. These definitions and derivations are then reviewed with the data stewards who own the data to ensure that the information is correct and complete. Once they are fairly stabilized, we add them to the Business Glossary to be shared across the enterprise.
- Business Glossary: This goes by many names at different companies, what I mean here is the compilation of robust metadata (definitions, derivations, ownership, etc.) for business data elements. The key is that project personnel should USE the glossary and leverage all the work that has gone before so that they don't end up redefining existing data elements or creating a derivation that doesn't jive with the enterprise version.
- Data Quality Rule spreadsheet. This is filled out by the Data Quality Analyst assigned to the project. It is used to collect data quality rules for problematic data elements, where it is suspected that the quality of the data is too low for the intended use. The data quality rules are then reviewed with the data stewards who own the data to ensure that the information is correct and complete. These data quality rules are also tested during data profiling, and revised as necessary. By the way, it is not a given that the project is responsible for fixing the quality of the data, which can be a major undertaking. However, in the cases (and they do happen) where poor data quality actually block the implementation of the project, this needs to be recognized early on and dealt with!
- Data Profiling Results spreadsheet. For data elements with problematic data quality, the data is profiled to measure the extent of the problem. Although most data profiling tools can output a variety of reports, the result can be summarized if need be for use on the project. This is especially important if a data quality issue is of such severity that it must be fixed before the project can go to production. The data may be profiled by the project Data Quality Analyst, or by a data profiling specialist if your enterprise has such people.
- Data Quality Solution Evaluation. This form is filled out as the project goes to design and a technical solution is being proposed. Many technical solutions will compromise data quality, and this is often done without realizing this impact. For example, overloading a field (using it for more than purpose or changing its use) may seem like a cheap and easy way to avoid adding a field. But this often destroys the quality of the data stored in that field, or requires “special knowledge” to use the field properly. This form is the responsibility of the technical lead, who works in conjunction with the Data Quality Analyst and reviews the results with the Chief Data Steward. It is important to understand that although putting data quality at risk does not automatically make the solution unworkable, that risk needs to be understood and given importance in deciding what course to follow.
A variety of roles have been mentioned — Metadata Analyst, Data Quality Analyst, and Chief Data Steward. The Chief Data Steward is not a project-specific role, this is the person who runs the Data Stewardship Committee. However, the Metadata Analyst and Data Quality Analyst are roles assigned to individuals on the project. Sometimes, these people are analysts already working on the project, however, more often, they are people who are added to the project and are billed to the project. A very large project may need a full-time person, but most normal projects can be staffed by one person split among several projects. The Metadata Analyst and Data Quality Analyst have specific tasks to perform and information to collect. They play the role of the data stewards, who can’t participate on each project. Instead, these “project data stewards” do a lot of the initial information gathering and then review the results with the data stewards, reducing the demand on the data steward’s time.
Executing on data governance on a project takes some doing. You need the funding for the project data steward, and you need the project manager to understand what is needed (and add that to the project estimate). We built an entire presentation to educate the project managers on what Data Governance is, what we do, the value we add to the project, and what tasks and milestones they'll have to manage.
You’ll learn as you go along, and will need to make adjustments to the roles, tasks, and artifacts as you learn and adjust. One easy way to do that is to have the project methodology tool simply link to the information (such as the task descriptions and definitions), artifacts (such as the spreadsheets and forms) and roles. Then, as you need to make any changes to these items, you can make them on your own web site and the project methodology tool does not have to be changed. That is, the project methodology team is responsible for the context and the data governance team is responsible for the content. This seems to work very well.
By the way, if you're interested in this topic and would like more details (including sample artifacts), I'm giving a one-hour presentation on this topic at the Data Governance & Information Quality Conference in San Diego at the end of June. Hope to see you there!