Data Quality Challenges

By on

The Data Quality and Data Management market is going through a paradigm shift where the focus has turned to the business user. Historically, business users have been at the mercy of over-burdened IT departments with limited resources, but IT is not to blame. Even with the simplest query, the answer used to be “It’ll be six weeks before we can give you that.” But, in the new world of self-service data, the business wants it now. Tools are emerging that can empower users to have a different relationship to their company’s data, though within that new landscape, Data Quality challenges become an even more complex issue.

“Thirty years ago, systems were all in separate silos,” said Kevin McCarthy, in a DATAVERSITY® interview:

“Nine-track tapes full of customer savings and checking account information would get loaded onto a mainframe and then processed using a series of tools to standardize the information and identify the components of the names and addresses.”

Then individual relationships and household relationships would be built among those records, he said. It’s now possible to standardize personal data, build relationships, and do matching without complete or identical records: “Being able to sort out fuzzy matches, transpositions, double characters, and things like that.”

Although methods have changed significantly in that time, companies McCarthy calls “enterprise players” — large Fortune 2000 companies such as IBM and SAP — still lean heavily on IT. These larger organizations focus on the “one-stop shop” concept so that they can offer complex and very technical features. At the other end of the spectrum, he said, the customer data platforms (CDPs) that are emerging are more targeted to a specific market or service need.

In addition, there are MDM players that offer subtle differences from the CDP players, along with various companies offering different campaign management and marketing options. “There are still a lot of tools out there, and everyone’s trying to carve out their niche.”

McCarthy used the idea of a safe deposit box to illustrate how the ownership and control of data is shifting. “You put things into a safe deposit box and the bank is responsible for keeping it safe, and they provide the box, but the bank doesn’t really care what’s in the box.” Historically, IT has been the bank: they have the hardware and the systems to hold the data, but business users care about the contents and want to ensure they have access to it when they need it. “And that’s the switch. Now the business is looking for that level of control.”

Self-Service Data Shortens Time to Value

By offering a sandbox environment that gives access to the data, business users can experiment, run queries, and interrogate that data without waiting for IT. Using a drag and drop interface, business users are able to set up those rules, filters, and processes on their own, he said, “Which means that you don’t have to be a SQL programmer to be able to run SQL-like steps.”

Potential for Data Enrichment

Experian has a wealth of data assets, as well as providing tools that help with Data Quality Management challenges. McCarthy talked about the potential value in enriching quality customer information. He considers names, addresses, emails, and telephone numbers, “One of the toughest data sets to deal with.” For example, from a data perspective, a customer named Peg Smith at Avenue of the Americas, New York, and a Margaret Smith, 6th Avenue, New York, have nothing to do with each other. But “Peg” is a nickname for “Margaret,” and Avenue of the Americas is really 6th Avenue if it goes through postal certification – so it’s actually the same street, and Peg and Margaret are probably the same person.

The Single Customer View in Context

McCarthy talked about how the term “single customer view” has evolved over time to become a more contextual view of each customer’s information.

“From a marketing standpoint, what I consider a ‘customer’ may be a little looser because I want to cover everybody, so I might bring them together because I don’t want to have to send out multiple catalogs.”

Whereas in accounting, the purpose is to ensure that a bill is sent to one specific person at one address, “So I have to be more stringent in how I match those records to find that customer. That ‘single customer view’ is single to the eye of the beholder,” and that view varies not just in different industries, but from department to department within a single company.

“And we still have the legacy of IT. When they have gone that route already and defined a single customer view, they expect it to cover the entire corporation.” In reality, what McCarthy’s seeing is that departments have different needs and may want to be able to contextualize data differently. “It’s about providing tools that they can use to sandbox and trying different relationship methods and matching technologies to define ‘customer’ based on what their need is — and that’s what we’re doing at Experian.”

The Challenge of Customer Information

Although customer information has been a key focus of businesses for more than 30 years, the complexities of managing that information have increased exponentially. Along with data entered by trained data entry people, data from customer service reps taking calls is now part of the mix. Added to that is the complexity of data created by customers adding their information from web forms, which varies widely in the way it is entered.

It’s less of an issue in the IoT world, where a refrigerator, for example, reports on its temperature. “My refrigerator doesn’t have a bad day and accidentally send the temperature in Celsius instead of Fahrenheit.” Someone entering their name on a web form, however, might spell it “McCarthy,” or “MaCarthy,” or “MacCarthy,” depending on how much attention they have at that moment, and if an existing record differs from that spelling in some way, a duplicate record can be created.

No matter what controls are in place to standardize data, the fundamental problem of identifying a unique customer persists. “Charlie’s record is in here six different times because he’s been entered by three different methods, but he’s really just the same person.” Data Quality challenges are inherent in the process, and those Data Quality challenges will be ongoing as long as there are still people entering information, said McCarthy: “As much as you try to prevent people from putting bad data into the system, you can’t stop everybody, and when you’ve got millions of people entering information, they’re going to mess it up.”

A Data Company

Experian is known as a global data and analytics leader. While the company is well known for credit, they also have software specifically for Data Quality Management. Their focus now is much more towards the business user and making Data Quality Management more accessible to the business. New developments include a focus on ease of use and time to value for the business user, as well as on Experian’s extensive data assets.

Image used under license from

Leave a Reply