Typically, Data Governance programs start with Data Quality, because that is where end users or stakeholders begin to interact with data, especially from the reporting and analytics perspective. “They get a report that doesn’t match another report and they can’t marry it to other data,” said Mary Anne Hopper, Data Management Consultant at SAS Institute.
Speaking at the DATAVERSITY® Enterprise Data World Conference, Hopper said that Data Quality alone is not enough — Data Stewardship and Data Governance are essential partners, and success depends on the integration of all three areas.
Hopper defined Data Quality as the conformance of data to the business definitions and business rules. Because “quality” means different things to different people depending on how they interact with it, it’s essential to create a formal definition of quality data.
An ETL developer might consider on-time data loading an indicator of quality. Someone looking at a report might consider quality data to be a set of numbers from different sources that match up. A sales department might define quality as having standard business terminology for all the fields in a customer record.
No matter how many divergent definitions there are, she said,“We really have to think about defining it overall — about the conformance of data to the business definitions and the business rules.”
Finding a Definition of Data Quality
Hopper presented questions to ask to help determine how individuals across the organization define quality data:
- Is my data complete? If not, what is missing?
- Are there duplicate records?
- Do reports arrive on time?
- What data is incorrect?
- Does data provide conflicting information?
- Does my data align with enterprise standards?
Involving all aspects of the business in this process is vital, she said, because “data without business data has no defined quality.”
Data Quality Misconceptions
Hopper addressed five common misconceptions about Data Quality:
- MYTH: Data Quality is a one-time exercise — FACT: There is always new data
New data coming from — and therefore controlled by — outside sources requires a commitment to ongoing measurement and monitoring.
- MYTH: Data Quality is an IT initiative — FACT: Data needs business stewardship
Data Quality must have business-defined terms and business-defined user acceptance criteria, but it should be a partnership between business and IT, she said. IT is responsible for understanding how the back-end works and for putting solutions in place that align with what the business says that they need.
- MYTH: Data Quality is a project issue — FACT: Data Quality is an enterprise issue
Once data is created, even in the context of a project, it will be used by others. Data tends to be created and then recreated numerous times as it makes its way through source systems, through applications, through reporting and analytic platforms, all the way to Excel and Microsoft Access databases, she said.
- MYTH: Data Quality means data clean-up — FACT: Data Quality means not needing to fix data
Proactively setting and applying rules for acceptable data provides clean data that users can put to work as needed.
- MYTH: Data Quality is the responsibility of the source system — FACT: Ownership must be defined
Formerly, ownership of data did reside with the source system, but now, “quality” and responsibility for where it is applied needs to be defined.
Business Definition of Quality
Quality cannot be measured or improved if definitions and rules aren’t clear, if valid values aren’t clear, if the context is missing, or if there’s no shared understanding of what quality data is.Hopper showed a record with multiple unlabeled fields, illustrating the value of context in understanding data, as well as the importance of consistent terminology shared by business users. “Field 1” just isn’t enough, she said.
A steward works to protect a valuable resource and ensures the health and sustainability of that resource, she said. A data steward does the same thing when they develop and protect information resources for an organization, ensuring the integrity of the data so that it can be used properly by others in the organization. A steward provides ongoing monitoring and maintenance of data assets, and in most organizations, they focus on quality, because in the end that quality translates into usability, she said.
Types of Data Stewards
Many organizations are unclear about whether data stewards live in the business or in IT, but it’s important to understand that there are different types of data stewards. Some are more business-focused, working with business terms, definitions, and rules, and so they become the go-to person to help business users with a quality issue.
A technical data steward is likely to be more on the IT side, associated with data movement or data provisioning. They understand how data works in the background to ensure that, for example, a new business rule is written to establish Data Quality is feasible, and won’t negatively impact other processes.
Key Responsibilities of Data Stewards
Data stewards are typically responsible for defining and monitoring data policies, ensuring Data Quality,
defining and measuring compliance with business rules, and managing business terms, as well as facilitating and driving communication between data stakeholders.
Data stewards spend about 75% of their allotted stewardship time defining standards, looking at Data Quality, and then correcting Data Quality. In many organizations, Data Stewardship is a full-time role.
Data Stewardship is an Inside Job
Data stewards often have a set of personal characteristics — a persona, she said. And typically, they’re people who aren’t just hired off the street (unless the Data Stewardship need is very specific to a particular industry). They are familiar with core business processes, collaborate easily and work well within the existing culture, and they understand enough “tech-speak” to be conversant with technical folks.
Typically, good steward candidates are people who grow with an organization over time, can articulate the value of Data Management practices, whether or not they are in IT, and have the influence to improve the Data Management culture.
Models of Data Stewardship
Hopper presented four different models of Data Stewardship and suggested uses for each.
- The Data Subject Area model is ideal for master data domains because it is flexible, business-driven, and ownership boundaries are clear, which promotes consistency across systems, she said. This model may conflict with existing reporting structures and requires good communication to succeed, but it’s a model she often recommends.
- The Business Function model relies on stewards with established business context and expertise, which can be good for getting a program off the ground very quickly. Stewards can operate from within their own departments, but this can limit accountability across other departments, and policies from multiple areas may conflict. This model requires a strong commitment to coordination among departments to be successful.
- The Application/System model works well for companies with CRMs, ERPs, and other complex applications, where stewards know system data, and ownership is clearly defined. Because of its application focus, this model provides a limited view of the larger business-wide picture.
- The Project model is best for the quick launch of high-impact products. Projects usually have funding, and clearly defined ownership. A project can provide a good starting point for an ongoing program, but projects are usually designed to have a finite lifetime.
She stressed the importance of understanding that each has strengths and weaknesses, and knowledge of the organizational structure will lead to the right model.
“Governance is the organizational framework for making decisions around data,” she said. Organizations at the very beginning of Data Governance implementation, as well as those looking to get an existing program back on track will benefit by identifying program objectives, determining their guiding principles, creating decision-making bodies, and defining decision rights.
- Program Objectives: Why? Written objectives delineate the reason for Data Governance, and all activities should be tied to those objectives. Strive for a list of eight to ten, rather than 100, she said.
- Guiding Principles: How? Guiding principles serve as guardrails, letting people know what they can and can’t do. “An example might be that customer data attributes will be shown through a common process or interface.”
- Decision-making Bodies: Who? Develop an organizational framework that defines groups for data owners, program managers, data managers, and stewards as well as identifying committee and/or council members charged with oversight.
- Decision Rights: What? Decision rights provide a way to tell people what they’re expected to do as you begin to onboard them.
Quality data requires oversight from Data Governance and participation from stewardship. Data stewards actively monitor data and recommend policy on behalf of the organization — they are the conduit between Data Governance and Data Management outcomes. Yet Data Stewardship without Data Governance has no charter or mandate, so formal identification of data stewards is critical.
To successfully manage as an asset, Data Quality, Data Stewardship, and Data Governance must be aligned, she said. “If you try to tackle just one of these without thinking about how they impact the others, your program is going to be a lot more difficult. Quality doesn’t happen in a vacuum.”
Want to learn more about DATAVERSITY’s upcoming events? Check out our current lineup of online and face-to-face conferences here.
Here is the video of the Enterprise Data World Presentation:
Image used under license from Shutterstock.com