Avoid These Mistakes on Your Data Warehouse and BI Projects

Click to learn more about author Wayne Yaddow.

The is the first part of a three part series. The links to Part 2 and Part 3 are at the bottom. Data warehousing (DW) and business intelligence (BI) projects are a high priority for many organizations who seek to empower more and better data-driven decisions and actions throughout their enterprises. These groups want to expand their user base for data discovery, BI, and analytics so that their business users make informed decisions. At the same time, users are demanding high-quality and often complex BI reports.

Data warehouse projects are highly intricate and fundamentally risky. Among their many tasks, project managers who lead the data warehouse team must identify all Data Quality risks. A primary goal of this process is to document essential information relating to project risk.

This article focuses on how to avoid the following four common mistakes that other data warehouse and BI projects have experienced, in order to plan and implement new functions and capabilities with success.

Failing to Introduce Quality Assurance Early in the Project

During the initial phases of data warehouse/BI projects, the focus is often on BI requirements and data-related needs to build the operational data store, the enterprise data warehouse, and application reporting infrastructures. Somehow, the importance of end-to-end DW project testing and Data Quality is often overlooked.

There is always an appreciation of Data Quality. Still, as data warehouse requirements and design progress, the overwhelming focus on Data Modeling, data capture, and ETL design may cause the team to lose focus on Data Quality. Eventually, issues such as these will arise:

“Target data is not reconcilable with sources.”
“Duplicate data abounds.”
“Aggregations and report drill-downs are not correct.”

Ultimately, the success of a data warehouse is highly dependent on the ability to plan, design, and execute a set of tests that expose early and ongoing issues: issues with data inconsistencies, Data Quality, data security, the ETL process, performance, business-flow accuracy, and the end-user experience.

Many data warehouse teams debate when to start testing as they’re developing new software. For most DW projects, software testing should begin as soon as the design and requirements are baselined. An early start to QA provides several advantages that improve the overall efficacy of software testing. QA participation at the beginning of a project makes testers more effective throughout by enabling them to learn about the product and business rules they will be testing. They will likely design better test plans and test cases.

During the design and requirements phase, testers can work with developers to determine what aspects of a design are testable and what areas will have a higher risk. This knowledge will help prevent testing errors and better equip testers to design test cases and identify defects.

Implementing a successful data warehouse project is challenging. It requires a balance of many factors, such as strong business involvement, thorough data analysis, a scalable system, Data Architecture, a comprehensive program, Data Governance, high-quality data, use of established standards and processes, excellent communications, and project management.

Failing to Adequately Profile and Validate Source Data Before Loading to the Data Warehouse

Analysts’ studies have consistently shown that over 75% of data warehouse and data integration project teams have either overrun their schedules or their budgets or otherwise experienced project failure. Why the high failure rate?

Inadequate source Data Quality is the root cause of failure across a wide variety of data warehousing projects. Profiling and validating all source data upfront can generate significant benefits.

The traditional approach to data warehouse projects follows these basic steps:

Analyze the business, user, and the project’s technical requirements.
Analyze the available internal and external data sources.
Identify and analyze a set of data sources from legacy systems, operational systems, and external sources to determine their relevance to the requirements of the target database.

It may be folly to assume knowing your source data before beginning to design your target data warehouse. The main weakness in the traditional data integration approach is that it assumes that the data required for an application is fully available from the data sources. Major corporations have spent millions of dollars on data integration projects only to learn later that the source data did not support the target model.

Data profiling should be conducted for each data source: implement table analysis, row and column analysis, primary and foreign key assessments, and cross-table analysis. Source data profiling should also be considered to discover minimum, maximum, mean, mode, percentiles, and duplicate values – even profiling metadata such as data types, data lengths, null values, and string patterns.

Giving Inadequate Attention to Test Automation

With the advent of DevOps for data warehousing, organizations release new applications faster than ever – sometimes on demand or multiple times a day. However, numerous businesses are still using manual ETL test processes for highly visible or customer-facing applications. That translates into a risk to customer loyalty, the brand, confidential data, and worse. Even with new automation tools entering the marketplace, ETL and data profiling testing today continues to be primarily accomplished with manual testing.

Automating ETL tests allows frequent smoke and regression testing without much user intervention. Automated testing of trusted code after each new database build can save measurable time and costs.

A decision to implement automated tools for data warehouse testing depends on a budget that supports additional spending to meet advanced testing requirements. If implementing vendor-supplied test automation is deemed cost-prohibitive, it is essential to consider test tools built and maintained in-house because they are likely to be of a more significant advantage than no test automation at all.

When developing scenarios for test automation, evaluate your complete set of test scenarios to determine the best candidates for automation based on risk and value (ROI): Which types of defects would cause you to stop an integration or deployment? Which types of tests exercise critical, core functionality? Which tests cover areas of the application that have been historically known to fail? Which tests provide information not already covered by other tests in the pipeline?

In the end, test automation saves time and money, and more importantly, business users will appreciate the quality of BI deliverables and accept the data from the data platform solution as a “single version of the truth.”

Implementing Flawed Project Change Management Controls

Change is constant on any data warehouse project. Regardless of industry, there eventually comes a need for new requirements and other changes. A drive for continuous improvement and new data warehouse requirements commonly initiates a variation in the project scope or deliverables.

Change management is an essential component in the success of data warehouse initiatives, but how often is comprehensive change management minimized? According to Forrester’s Q1 2014 Global BI Maturity Survey, fully half of those surveyed believe that their processes for managing change based on new data warehouse data and functionality are not well established and do not function smoothly.

Testers need to reference critical documents involved in all changes – documentation such as business requirements, design and technical specifications, data mapping documents, ETL job flows, and more. The ability to identify and link these documents to the overall change management process and to prepare test plans are critical for effective quality assurance.

Change for BI initiatives often comes from multiple sources, including business owners or other stakeholders’ requests and impacts from source system changes. The QA team should participate in how changes are logged, managed, prioritized, and updated, and then should assure that all changes are verified. If your organization already has a change-tracking tool, it is a good idea to make use of it.

Data warehouse and BI/analytics projects tend to budget for technology and implementation, but change management and post-go-live adoption activities are often underfunded – or even overlooked entirely. Good change management facilitates communication from the outset, motivating users to move from resistance to acceptance and even excitement – increasing buy-in and greatly enhancing successful adoption of the new functionality

Patrick Meehan, research director in Gartner Group’s CIO Research Group, states that most failing data warehouse/BI (and DM/A) projects are delayed or go over budget due to poor communication of new and changing requirements between the management and IT teams.

When your organization is rolling out a data warehouse project, chances of success are greatly enhanced when change management is an integral part of the initiative.

Final Thoughts

Mistakes described here focus on helping organizations sidestep QA problems that many other data warehouse projects have experienced. The tips offered will help ensure satisfaction as data warehouse teams plan and implement new functions and capabilities. These time-tested recommendations may save significant money, time, staff resources, and improve results from the data warehouse application under development.

Avoid These Mistakes on Your Data Warehouse and BI Projects Part 2

Avoid These Mistakes on Your Data Warehouse and BI Projects: Part 3

Data Topics