Click to learn more about author Wayne Yaddow.
In Part 1 and Part 2 of this series, we described how data warehousing (DW) and business intelligence (BI) projects are a high priority for many organizations. Project sponsors seek to empower more and better data-driven decisions and actions throughout their enterprise; they intend to expand their user base for BI, analytics, and data discovery so that users will reach informed decisions.
This series of mistakes to avoid focuses on helping organizations sidestep Data Quality issues encountered on DW projects. Tips offered here will aid in assuring satisfaction as DW teams plan and implement new functions and capabilities.
Following are more of the mistakes experienced by a variety of practitioners.
Depending on BI Report Tests to Uncover Defects in the Data Warehouse
DW projects are rife with challenges — from the quality of data in the warehouse to the derived values in BI reports. If not addressed promptly, poor Data Quality (especially within the data warehouse) can bring entire projects to a halt.
Data warehousing often fails to meet expectations due to the lack of Data Quality. Studies have shown that Data Quality most often declines while moving data from sources into various areas of the data warehouse and data vaults. A thorough assessment of Data Quality should be performed when source loading begins and at each step where data is cleaned, transformed, aggregated, and integrated into the data warehouse.
Waiting until BI or analytic reports are available to conduct a majority of project tests for Data Quality (data transformations, duplicate data, missing data, null values, application of business rules, etc.) can result in lengthy and complex troubleshooting for defects that could have been identified and fixed earlier at less cost.
Following are five DW data issues — as identified in multiple studies — with the greatest likelihood of occurring and having a direct effect on project success. Note that few of these risks are associated with defects in BI reports.
1. Source Data Quality errors — from internal and external sources
2. Source-to-target data mapping and schema errors
3. Data errors resulting from multiple ETL processes
4. Query performance issues ETL processes
5. Data cleansing and transformation errors
BI and analytic report testing should be all about assuring that data is accessed correctly from the data warehouse, that report layouts are as defined, that usability requirements are met, that report drill-downs and aggregations work correctly. The BI report testing process should not be when you learn that data is missing or duplicated in the data marts or that other defects exist that should have been discovered during ETL verification.
Failure to Adequately Execute DW Performance, Load, and Stress Tests
Most project managers would not delay functional and Data Quality testing during the development of their DW project. However, too many DW projects are testing performance, load, stress, security, and recovery in the later stages of the development life cycle — sometimes without allowing sufficient time to correct any important issues that are found.
Unacceptable performance can be an important cause of DW project cancellation. DW performance should be explored for both query response times and ETL times (e.g., hourly, daily). Query response times may be as long as a few minutes when millions or more rows are accessed for calculations. With few exceptions, this is unacceptable. Such performance will negatively impact the availability of the DW to the users.
DW performance is vitally dependent on database performance. Old-school development methodologies reserved performance testing for late in the development life cycle; it simply wasn’t possible to test performance earlier because there was no representative working system until very late in the game.
Some best practices for agile database performance testing include:
- Developing a Performance Test Plan: Tests should be identified for both individual transactions and common use cases. Test loads should be configured to test throughput under a variety of loads ranging from idle to breaking-point stress tests. Concurrent tests should vary target data to identify caching issues at both the application framework and database level.
- Providing Adequate Time and Budget: for deployment of a suitable performance testing environment. The performance test platform should be of production scale or a quantifiable fraction thereof.
- Planning for Realistically Scaled Data: This can be a production snapshot (for existing systems) or may need to be realistically scaled synthetic data for new or rapidly growing systems.
Failing to Provide Adequate Test Results Metrics to the Project Team
On test teams, one of the key deliverables is information that delivers the progress of test work and testing status. Test metrics are vital to understand the current testing effort and consider improvements needed in both development and testing.
Today, much DW testing is conducted by business analysts, data analysts, and developers. As a result, actionable testing metrics on numerous DW projects may not be available. Members of many QA teams have reduced participation in DW testing due in part to a lack of data-related technical skills — skills necessary for testing DW projects.
There is always a need for quality assurance related metrics. Your project team will only be able to improve and control what is measured when planning, developing, executing, and completing tests that are reported using metrics that quantify progress and success.
QA metrics must be designed to be trusted and used to make actionable project decisions. The metrics developed during DW testing offer two major benefits to project and business managers: visibility into 1) the maturity and readiness for release to production and into 2) the quality of the software product under development. Testing reports enable effective management of the software development process by allowing a clear measurement of the quality and completeness of the DW project.
Metrics for each of the DW primary quality categories — database management quality, data model/schema quality, data warehouse Data Quality — should ideally include: test case pass/fail metrics, productivity metrics (i.e., which planned tests were run or not run), defect rate density by DW component, defects discovered by priority/severity, and requirements not covered by tests.
The QA team should deliver testing metrics to stakeholders that compellingly demonstrate testing progress. And, QA metrics help project stakeholders to develop go and no-go project decisions.