Click to learn more about author Wayne Yaddow.
Data warehouse (DW) testers with data integration QA skills are in demand.
Data warehouse disciplines and architectures are well established and often discussed in the press, books, and conferences. They have become a standard necessity for most modern organizations. Each business often uses one or more data warehouse systems to make company decisions every day. In a word, the data warehouse is a standard component of business infrastructures.
New business models, constant technological progress, and ever-changing legal regulations require that companies replace their business applications. As a side effect, there are demands for integrating data from existing source applications to target applications – often supported by a data warehouse. Doing so calls for a stringent data integration process model combined with well-defined quality assurance measures.
Every year, companies lose millions as a result of inaccurate and missing data in their operational databases. This severe issue corrupts data warehouses, causing them and reliant applications to fail.
As a discipline, data warehouse quality assurance covers much more than technology. It includes roles and organizational structures, monitoring, measuring, reporting, and remediating data warehouse issues.
Data warehouse testers’ technical skill requirements are unique and demanding, but the rewards can be many. For those who enjoy testing, hold a strong interest in Data Quality, have a knack for coding queries, data warehouse, and data integration, DW testing may be a significant career step.
To get started or further enhance the know-how needed as a data warehouse tester, several learning stages may be necessary. They are listed below with suggestions for gaining the required knowledge and experience.
Understanding Data Warehousing Concepts
QA analysts should understand the terminology used in data warehousing, the basic flow of DW data, and routine testing tasks on an ETL QA project. Data models and data mapping documents are standard means for expressing DW requirements and architecture. Therefore, the ability to comprehend DW requirements, then create test scenarios, is a must. Ralph Kimball and Joe Caserta’s book “The Data Warehouse ETL Toolkit” (Wiley Publishing) and Andy Oppel’s “Databases: A Beginner’s Guide” (McGraw Hill Publishing) are good starting points. These books offer an overview as well as some detail about the related architectures and terminology. Larry English’s “Improving Data Warehouse and Business Information Quality” (Wiley Publishing) offers several chapters on assessing and assuring Data Quality.
Figure 1 displays a basic representative data warehouse implementation – from the identification of source data (lower left) to report and portal reporting (upper left). In between, several typical phases of the end-to-end data warehouse development process are depicted, such as source extract to staging, dimension data load to the operational data store (ODS), fact data to the data warehouse, and report and portal functions extracting data for display and reporting. Figure 1 illustrates that all ETL programs and data movements should be verified throughout the end-to-end QA process.
Grasping the Challenges of DW Testing
Testers should understand the challenges that data warehouse testing frequently presents.
- Data models and data mapping documents represent many aspects of project requirements; familiarity with these artifacts is essential to data integration testing success.
- There’s a large variety of Data Management systems, including Oracle, Microsoft, and IBM. More than a few enterprise data warehouses employ multiple database products from these and other vendors.
- Databases are often so large as to necessitate tests based on data sampling. Choosing the best methods of data selection is a necessary skill.
- Few test tools exist that cover the many needs for data integration testing, particularly tools to aid in testing complex data transformations. Therefore, testers should be creative with manual testing while learning and implementing as many available tools as possible. With high expectations, but qualified staff in short supply, QA managers should seek powerful yet easy-to-use tools that can be quickly deployed without overhauling the budget.
- An extensive understanding of SQL queries, data profiling methods, Excel, and DB editors is essential. Fortunately, e-learning, books, and classes are available for these needs.
- The ability to assess readiness for data warehouse testing during and after test planning is complete.
There are many more challenges and solutions for those challenges. You can learn about them in vendor classes or online from those offering data warehouse tester training.
Planning for DW Tests and Test Cases
Initiating and implementing a data warehouse test-planning process may be a new experience for DW test candidates. You can ease this effort through reviews of existing data warehouse “test planning templates,” “master test plans,” “test strategies/approaches,” or data integration “end-to-end” test plans. Through observations of DW planning templates and associated checklists, people seeking to become data warehouse testers will learn from these to create effective strategies of their own. In these documents, testing topics and test scenarios – unique to data warehouse QA – will be discovered.
- Common ETL goals and objectives as checklists for test coverage
- Examples of defects frequently found during ETL testing; writing test cases to find them
- Recommended ETL test scenarios and test cases
- Estimating DW test resources and schedules
- Source to target data profiling – what to look for
- Formal QA entry and exit criteria for data build deployment
- Examples of ETL test scenarios
- Functional testing (e.g., security, performance, ETL error logs, regression testing guidelines)
- Assessing test plans and test readiness
Using data warehousing as an example, Figure 2 illustrates the primary checkpoints (testing points) in an end-to-end data integration testing process. The figure demonstrates how data should be verified as it is extracted from sources, transformed for loads into target databases, and aggregated for loads into data marts. Only after data owners and other stakeholders confirm that a data integration was successful can the whole process be considered complete and ready for production.
Planning and Managing Test Data
Those new to data warehouse testing should learn how best to address the complications often encountered when planning, then selecting, test data for the data warehouse QA effort. The variety of choices for data selection, and challenges associated with each, must be learned so that testers can make wise choices.
Characteristics of a Test Data Management Strategy
- Generate an adequate quantity of required test data from all data sources
- Create synthetic data for any missing data (e.g., that needed for negative testing)
- Carefully schedule test data extraction and load process
- Reduce security risk in test data for sensitive/personal content
- Manage test environments for test data preparation and cleanup – gain DBA and configuration management support
- Provide access control to test data sets that multiple QA users consume
Critical Challenges to Test Data Planning
- Availability of essential test data to verify and validate all business uses cases and rules
- Identity of all source tables, the constraints, and dependencies
- Understanding the range of possible values for various fields (including all boundary values)
- Ability to create volumes of test data from heterogeneous data sources that are needed in the test environments
- The number of resources (e.g., people, tools) to develop traceability for business requirements à to test cases à to test data
Tester Skills for DW Testing
Staffing and training the QA team are among the essential steps in the test planning process. Testers’ skills can be developed so that a job description is easily prepared and candidates effectively interviewed. Critical DW QA needs, skills, and experiences should be outlined so that QA team members can be chosen and prepared for all facets of testing.
The following is a data warehouse testing job description that represents what organizations are often seeking.
- Solid understanding of data warehousing concepts, architectures, and processes
- Multi-years of professional experience testing large databases
- Talent to collaborate with product, project, data engineering, and analytics team to provide input on QA resourcing needs and timelines
- Experiences with relational and dimensional database structures
- Experience with ETL & BI test planning and hands-on testing
- Understanding of ETL mapping documents for developing test cases
- Formal training and experiences with SQL queries: Oracle / SQL Server / DB2
- Exposure to testing across all layers of a data warehouse and/or data integration platforms
- Experiences with SQL query development and query execution scripts based on ETL mapping documents
- Experience in testing data transformations from source to target mappings
- Aptitude for researching and troubleshooting root causes of ETL issues
- Ability to use a variety of DB editors, Excel, and MS Access for analysis of test query results
- Solid in ETL data validation processes (e.g., Informatica/DataStage/SSIS)
- Experiences with end-to-end data validation for ETL and BI systems
- Ability to conduct various testing including DW unit, functional, integration, regression, performance, and post-production testing
- Strong in BI report validation in Cognos/Business Objects/Microstrategy/SSRS BI environments
- Ability to work with subject matter experts to resolve gaps/questions in requirements
- Aspirations to assist developers in recreating test failures leading to problem resolutions
- Solid understanding of business intelligence (BI) dashboard tools such as OBIEE
- Experience partnering with data stewards, data architects, and software engineers
- Usage and testing skills with reporting and analytics tools (e.g., OBIEE, Tableau)
- Experiences with scripting and automation of data testing
Data warehousing projects are known to be risky. They can fail for many reasons: poor Data Architecture, inconsistently defined data, inability to integrate data from varied data sources, missing and inaccurate data values, insufficient data transformation testing, inconsistent use of data fields, unacceptable query performance, and more.
DW project risks will be decreased when staffed with well-trained and motivated testers who serve their projects from early on and during the entire data warehouse development.