Rapidly Delivering Data Quality Metrics with a Repeatable Process: Is Your Data Fit?

By on

by Cecily Dennis and Dan Meers

Data quality metrics are instrumental in gauging the overall health of an enterprise. Poor data quality impacts your resources, morale, productivity, compliance, customer retention and bottom line.  Asking the question, “Is Your Data Fit?”  is too broad based and reactive.  Evaluating data quality and moving to a proactive capability requires thoughtful data quality metrics. Improving data fitness requires using short, repeatable processes and a thoughtful set of metrics to measure progress.

Ultimately, an enterprise strives to implement an integrated platform with consistent, timely and accurate data…quality data.  How do you know if you have quality data?  How do you measure data fitness?  Below is a list of important measurement areas:

  • Availability
  • Accuracy
  • Integrity
  • Completeness
  • Frequency

We recently conducted a succession of iterative data sprints for a Fortune 500 financial services firm and had to answer these questions, provide data fitness measures and implement a process for proactive governance.   The results helped closed regulatory, compliance and management reporting gaps found in an enterprise risk and compliance management program.  The client had to address risks in areas such as customer identity, information protection, regulatory compliance areas and customer identify verification. “Fitness levels” varied by use case.

To keep iterations short, we used a data sprint process. A data sprint is an agile, business-oriented project execution model that requires disciplined deliverables and fixed milestones.  Data sprints allow for additional, parallel sprint “legs” to be created and executed as findings occur during the assessment.  Subsequent data remediation projects are more streamlined.  The different risk management use cases required different levels of data fitness.  Each need was catalogued and compared. The results should include rough order of magnitude cost estimates to help prioritize next steps.

For each use case, a set of steps were followed. The steps formed a standard “Data Sprint Playbook.” A playbook based model motivates and supports repeatable and consistent outputs. The following steps help you assess whether your data is fit for use:

  1. Determine which data is critical through discovery and interviews and establish the critical data element list.  In some cases, particularly in financial services, sanctions and regulations exist imposing additional requirements.  If regulatory compliant data is not readily available when audited, regulators can impose costly fines and consent orders.
  2. Validate sprint scope coverage by reviewing business processes and regulator needs. In this example, the critical data elements need to cover a minimum of Customer, Account and Activity areas.
  3. Document the critical data element’s definitions. Use the general definition to link them to specific definitions in the systems of record and authority.
  4. Profile the critical data elements from the systems of record and other “resting” points. The profiling reports can be consolidated and heat mapped. Rules based profiling should take into account key business questions and business processes to identify more relevant data quality metrics. For example, data metrics that allow compliance officers to attest to the data used in compliance operations are important for risk management operations. Business questions that would motivate risk management data quality metrics might include:
    • Does the data contain all of the customers including those who opened an account in the past hour or the past five minutes?
    • Does the data contain all transaction types?
    • Do the account balances correlate with the transaction stream?
  5. Define data controls and data control owners needed for ongoing monitoring and management. In the examples above, a data control might identify the set of transaction types in a set of data records and match that against a list of all known transaction types. Transaction types, such as cash transactions, that do not appear in the list within an expected frequency or dollar range would cause an alert.

Surprisingly, a sprint can proceed even with minimal data access—subsequent sprints can refine the results. Occasionally, data will not be available and you may need to create an expedited data request process to obtain access.

The data sprint playbook outputs provide clearer views of “data fitness levels,” gaps in the existing data and the support for estimating the effort to close gaps.  The standard, repeatable processes in the playbook used for the Fortune 500 client made it easier to support multiple use cases. The data sprint was conducted rapidly and delivered value. The sprint scope was defined by the number of critical data elements and was the primary driver of duration and effort. We conducted multiple, parallel rounds of analysis within a single sprint.

The data sprint delivered metrics on the fitness of the data for multiple use cases.

Use Case Availability Accuracy Integrity Completeness Frequency
Regulations

100%

100%

100%

100%

24 hrs

Policy

100%

NA

NA

NA

48 hrs

Reporting

100%

95%*

95%*

95%*

48 hrs

 

In the example metrics table, gaps are indicated by asterisks. The actual gap findings were detailed and specific to the domain.  The detail is not presented here because in the risk management area, metrics important to resolving compliance issues can reveal weaknesses that money launderers and fraudsters could exploit at a company.

The data quality metrics require specific calculations in order to produce quantitative answers. Each metric will have a different calculation approach and a different threshold. Measures often originate from an aggregation of records and rely on counts or other statistics.

  • Availability includes the delivery or receipt of the data by the service level agreement stipulated time of day.
  • Accuracy refers to the content of the data matching or reflecting the actual transaction (e.g. wire transfer) or status change (e.g. customer change of address) as they occur in the course of business events.  Absent source documents (e.g. change of address orders from customers) accuracy must be measured using either sampling and interviews or independent confirmation techniques.
  • Integrity refers to the data content matching the rules and structural requirements imposed in the data model or other formal management form. Examples include numeric only values in a numeric field, no nulls in key or other required fields ad unique entries for key fields.
  • Completeness refers to the data values having a full entry in each record.  It also refers to record level entries having all required fields populated.  The use of default or repeating values is easily identified in the profiling results. Even though they meet the basic requirement of completeness you may wish to measure their presence differently. Completeness may also refer to the total of records provided, for example, if a set of records represent all active customers or all new customers in the last 30 days.
  • Frequency refers to the timing of process cycles or other production activities that deliver or make the data available for use.  This is a time based unit of measure typically expressed in hours or days.

Once the metrics were created and reviewed, estimates were created to prioritize closing gaps. A select set of metrics were then added to a data quality dashboard.

Establishing data fitness is a critical first step in deciding whether your data requirements will be met.  It also provides immediate insight into potentially critical problems in your operations and technology environments.  Gaps are often traced to broken or missing operations process and systems controls that need to be fixed with validation look ups, edits or process changes. A playbook approach improves overall corporate fitness and employs standardized outputs that can be rapidly used by others.

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept