The Data Management Body of Knowledge (DMBoK) defines Data Quality (DQ) as “the planning, implementation, and control of activities that apply quality management techniques to data, in order to assure it is fit for consumption and meet the needs of data consumers.”
Since expectations about Data Quality are not always verbalized and known, an ongoing discussion is needed. Data Quality depends on context and the data consumer’s requirements.
A Short List of Data Quality Dimensions Are:
- Accuracy
- Completeness
- Consistency
- Integrity
- Reasonability
- Timeliness
- Uniqueness/Deduplication
- Validity
- Accessibility
Other Data Quality Definitions Include:
- “Fit for a purpose. Meets the requirements of its authors, users and administrators.” (Dr. Peter Aiken, adapted from Martin Eppler)
- “Reliance on accuracy, consistency and completeness of data to be useful across the enterprise.” (Michelle Knight)
- Tools and processes used for parsing and standardization, generalized “cleansing,” matching, profiling, monitoring, and enrichment (Gartner)
- Strong-Wang framework: (Wang, and Strong, MIT and DAMA DMBoK)
- Intrinsic DQ:
- Accuracy
- Objectivity
- Believability
- Reputation
- Contextual DQ:
- Value-added
- Relevancy
- Completeness
- Appropriate amount of data
- Representational DQ:
- Interpretability
- Ease of understanding
- Representational consistency
- Concise representation
- Accessibility DQ:
- Accessibility
- Access Security
A Few Uses of Data Quality Are:
- Increasing the value of organizational data and the opportunities to use it
- Reducing risk and cost associated with poor-quality data
- Improving organizational efficiency and productivity
- Protecting and enhancing the organization’s reputation
- Data profiling
- Data standardization
- Data monitoring
- Data cleansing
Image used under license from Shutterstock.com