Click to learn more about author Kevin W. McCarthy.
I’m a Data Quality guy, and I do it because I love it! That sounds incredulous, but it’s true. I didn’t always love it, but over three decades I’ve learned the thrill of investigating a data issue down to its root cause like Sherlock Holmes. There is something incredibly exciting about setting up a complex data flow that runs successfully in production for the first time or comedic about creating a profanity table to screen out choice words entered by aggravated customer service reps. From the mundane to the monumental, Data Quality initiatives can have wide-ranging, positive impacts for companies and consumers.
But I sometimes fall into the trap of thinking, “Everyone knows the benefits of Data Quality” or “Everyone knows the fundamentals of Data Quality–It’s DQ 101!” Only a few of us live in the Data Quality world (comparatively speaking). It is good to step back and think about the basics from time to time. And what better way to think about Data Quality than with a good analogy!
Over the years, I have learned a lot about Data Quality from building Ikea furniture. There are two important parts about building Ikea furniture: understand the big picture and inventory your assets before you get started. I typically start by using the big picture on the box for a guide and then drive my wife bananas by meticulously counting out every screw, bolt, etc., before I even get started. Those bookends are not only a successful build of a computer desk but also of a Data Quality strategy.
There are a few key steps we need to take:
- Understand the big picture. With any Data Quality initiative, knowing where you want to end up with is a great place to start. Are you just looking to clean what you have, build a single customer view, or both? Will multiple systems be merged into one? This will help you understand your data and tie it to larger business initiatives. No one should just be cleaning data for the sake of it—it should relate to customer experience, operational efficiency, etc. Remember, you need to create your own big picture on the box.
- Inventory your assets. The next step in any Data Quality initiative is to understand what kind of challenges you are up against. You wouldn’t start building without understanding whether you need a hammer or a screwdriver, right? Start with a thorough profiling of your data. Data may be coming from one or many sources, so it is important to perform analysis on each source. This includes looking at specific values (Does this true/false field have “T”s and “F”s or “1”s and “0”s or a combination of unrelated values?), shapes (Are the dates MM/DD/YYYY or DDMMYYYY or DDDYYYY or something else?), uniqueness (Are there duplicate values in that supposedly unique customer ID field?), etc.
- Transform, standardize and correct. Now that you understand your data elements, use that information to standardize the data into common formats. You can also transform those errant fields using data scans and table recodes. Consistency is king for making the most of this data in the future! You also want to try and verify and correct as much information as possible. This is especially true with customer and product data. This step is where you start your assembly, making sure you keep in mind that bigger picture.
- Build those relationships. At the heart of any true Data Quality initiative is the process of building relationships among your data records (let’s focus on customer data here). Matching is typically more than just deduplication, but it also allows companies to define a view into their customer relationships. This might be a single customer view, or it might be a household view, a business view, etc. It all depends on the department within the business, and there will be some variation. All of the other data standardization, verification, and correction efforts allow you to get the best possible results from your matching process. At this point, you are putting in the final screws to your new desk and making sure it can function.
- Enrich your view. Now that your existing data is clean, there are huge benefits to adding additional data to your portfolio. Again, with customer data, there is a slew of demographic fields that can provide a wealth of information for marketing segmentation and personalization. Think of this as accessorizing your new desk with a clock or lamp. You need to be able to see what you are doing!
- Initial cleanup and ongoing maintenance. Great job, you cleaned your existing data! Bad news, tomorrow it is going to be dirty again. Always think of your Data Quality initiative as a two-phase process: clean up what you have, and then prevent bad data from creeping back into your system. This is typically a combination of batch processing initially with real-time data checking on an on-going basis to standardize and correct data upon entry. Once you build your furniture, you don’t want dust to build up.
I wish I could say this is everything you need to know about Data Quality, but that would require more of a book than an article! However, this is a good framework to think about as you get your Data Quality initiative underway. And like admiring a well-built computer desk, your Data Quality initiative will provide value for years to come!