by Sue Geuens
As everyone realizes, bad data equates to bad intelligence, which equates to bad decision-making and thus equates to bad things happening in your business!
For example – your CEO releases your annual Financials to the press – BUT – the figures are incorrect – leaving both your organisation AND your CEO with egg on their faces. A huge scrabble ensues with the CEO desperate for the right figures so he can recover his reputation; the CFO running around like a headless chicken panicked and frustrated because the figures HE released to the CEO were those that had been given to him by his to-date competent staff; the BI team working endless overtime hours to find the errors and the market place (remember the ringing bell in the NY Stock Exchange) responding drastically DOWNWARDS to the released figures.
Definitely toxic data!! The problem is that the data needed to be validated and verified much earlier in the process. AND it would have helped if the data had been of appropriate quality in the first place. Upstream systems are fed by downstream systems and data quality problems are just passed along as the data flows from place to place and system to system!
There is no magic that will ensure that when you are counting your number of customers incorrectly in one system, that count will automatically be correct or even the same in the second system.
You have heard the saying about “one bad apple in a barrel poisoning the whole barrel”? It is exactly the same with data and one piece of bad or inaccurate data poisons other data. What increases the problem is that everyone who is expected to use the information based on this data distrusts the information – they have bitten the poison apple before. They therefore go and perform the same extracts and obtain a different result – which they trust as it was their extract. What they have forgotten is that its possible they have used the same data as they distrusted before, but because they extracted it, they trust it AND are willing to make decisions based on the information.
So how do we make sure that our data is not toxic and is not likely to poison our other data and thus the information on which we base our decisions?
Firstly it’s important to understand where your data originates. Has it been captured by your own work force? What measures were put in place to ensure that the very best job has been done and that the data being captured lives up to expectations? Is this data flowing from system to system to system? What are the requirements for the data your business needs and uses daily? Do you enhance your data from other sources (external or internal)? Have you recently done a data audit to confirm the currency of the data?
If you are not sure of the answer to any of these questions, then the origin of your data is most likely suspect or still needs to be determined. However, if you have been able to answer these questions, then you have a good basic knowledge of your data. This means that you should be able to identify where your data is toxic and possibly the root cause of the toxicity. This would further allow you to identify at least some of the steps required to medicate your data and get it back to where it should be.
Often you will find that the causes of toxic data are quite similar and can stem from more than one origin. For example, data capturing is always a good place to start the search. Even in an automated process (such as OCR), data is never captured 100% accurately. Human intervention makes this percentage much lower – even with all the technical help we have today. Once captured, the data flows through to other systems and however “clever” the system may be, it is not going to pick up errors unless it has been configured to do so. And I have yet to hear of a system/ application/ tool that can and does correct every single instance of data errors. What this means is that the data which started it’s journey with an error, will continue along the process with the same error. Ultimately we will use this incorrect data to formulate information that will thus also be inaccurate or incorrect.
So our CEO will report the wrong numbers in his annual report and this will be reported in the media. Proving that data really can be toxic. Isn’t it time to take a good, long, hard look at your data?