<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DATAVERSITY &#187; David Plotkin</title>
	<atom:link href="http://www.dataversity.net/category/discussion/blogs/david-plotkin/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dataversity.net</link>
	<description></description>
	<lastBuildDate>Tue, 18 Jun 2013 17:07:23 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>The Case of the Wandering Prescription</title>
		<link>http://www.dataversity.net/the-case-of-the-wandering-prescription/</link>
		<comments>http://www.dataversity.net/the-case-of-the-wandering-prescription/#comments</comments>
		<pubDate>Mon, 03 Jun 2013 07:10:11 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Governance and Quality]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[David Plotkin]]></category>
		<category><![CDATA[Discussion]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=20131</guid>
		<description><![CDATA[by David Plotkin Most of my medications come by mail from my online pharmacy, so I wasn&#8217;t surprised when I got an email from them recently. The email stated that they were unable to fill a prescription for me. I didn&#8217;t think too much about it until I realized that I didn&#8217;t recognize the medication. Since I didn&#8217;t need the medication, I wasn’t overly concerned that they couldn&#8217;t fill it. A few days later, however, I got a letter containing a letter explaining once again that they couldn&#8217;t fill the prescription &#8212; and the original prescription. Here is where things got interesting. The prescription was for David Plotkin &#8212; but not me. I didn&#8217;t recognize the writing doctor, whose office was about 800 miles from where I lived. There was a phone number (as well as other protected information) on the prescription, so I tried the phone number. It rang in the office of a Dr. David Plotkin (still NOT me), so I left a message for both him and the writing doctor, explaining that I had the prescription and it was NOT getting filled. I never heard back from either one. The next call was to the online pharmacy. [...]]]></description>
				<content:encoded><![CDATA[<p>by <a title="David Plotkin" href="http://www.dataversity.net/contributors/david-plotkin" target="_blank">David Plotkin</a></p>
<p>Most of my medications come by mail from my online pharmacy, so I wasn&#8217;t surprised when I got an email from them recently. The email stated that they were unable to fill a prescription for me. I didn&#8217;t think too much about it until I realized that I didn&#8217;t recognize the medication. Since I didn&#8217;t need the medication, I wasn’t overly concerned that they couldn&#8217;t fill it. A few days later, however, I got a letter containing a letter explaining once again that they couldn&#8217;t fill the prescription &#8212; and the original prescription.</p>
<p>Here is where things got interesting. The prescription was for David Plotkin &#8212; but not me. I didn&#8217;t recognize the writing doctor, whose office was about 800 miles from where I lived. There was a phone number (as well as other protected information) on the prescription, so I tried the phone number. It rang in the office of a Dr. David Plotkin (still NOT me), so I left a message for both him and the writing doctor, explaining that I had the prescription and it was NOT getting filled. I never heard back from either one.</p>
<p>The next call was to the online pharmacy. I explained to the agent about the mix-up. She noted that the phone number on my profile matched the phone number on the prescription, and asked if that was right.  I told her no, and wondered (with perhaps an edge in my voice) how on earth my profile could have been updated to show incorrect information? She explained that when a new prescription came in with changed information, they automatically updated my profile. At that juncture I pointed out that they had now corrupted my profile, as well as released personal protected information on another patient to someone (me) who had no right to have it.</p>
<p>So, data and process people, here is what puzzled the heck out of me. Prescriptions are not sent in by themselves, they have to come in with a form that includes the member&#8217;s name, doctor&#8217;s information, demographic information about the patient, and most importantly, the member&#8217;s id. So, even though I shared a name with the good doctor, something had to go seriously wrong at the pharmacy to have this mix-up occur. Not only did the member ids not match, but neither did much of anything else, including the Group number, phone number (but they FIXED that, sheesh), address, and writing doctor.  But apparently, based on name alone, they added someone else&#8217;s prescription to mine. Thank goodness they didn&#8217;t fill the prescription, or I would have paid for someone else&#8217;s medication because I keep a credit card on file to pay for my own.</p>
<p>At any rate, I spent the next 15 minutes on the phone with a supervisor untangling the mess they had made, and when we were done, everything was put back the way it should have been. The supervisor also verified that my phone number should NOT have been updated, at which I opined that they had a &#8220;training opportunity&#8221; with their staff on that point as well. They then sent me a mailer to return the errant prescription to them, addressed appropriately enough, to the &#8220;privacy violation lead&#8221; at the pharmacy company.</p>
<p>Now, to be fair, I have been a customer of this company for many years, they just went through a huge merger, and this is the first time an error has been made on my prescriptions. But it still should not have happened, and I imagine a whole series of safeguards were somehow ignored or circumvented. Keep in mind too, that the potential existed for the receiving patient, upon seeing his or her name on the label, to have started taking a medication that was never intended for that person. I just hope that the pharmacy investigates to find out where this took a wrong turn &#8212; and improves their process to ensure that this doesn&#8217;t happen again.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/the-case-of-the-wandering-prescription/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>If You Send Me the Wrong Data, I&#8217;m Hosed</title>
		<link>http://www.dataversity.net/if-you-send-me-the-wrong-data-im-hosed/</link>
		<comments>http://www.dataversity.net/if-you-send-me-the-wrong-data-im-hosed/#comments</comments>
		<pubDate>Mon, 18 Mar 2013 07:10:06 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Governance and Quality]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[David Plotkin]]></category>
		<category><![CDATA[Discussion]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=18605</guid>
		<description><![CDATA[by David Plotkin A few weeks ago, I had the amazing bad luck to become the victim of identity theft. This was, fortunately, a fairly minor issue; two different credit cards were applied for (and approved) in my name. The first came to my attention when I got a voice mail from Walmart because there was suspicious activity on a card &#8212; merchandise had been bought and sent to a delivery address in Idaho (more on that later). When I called back, I spoke to the order desk and verified that I had not made the purchase, and they agreed to cancel it. They then suggested that I contact my credit card company and cancel the card. So I asked them which card it was. Now here is where it gets interesting &#8212; they wouldn&#8217;t tell me! As I wondered aloud how on earth I could cancel the card if I didn&#8217;t know which one it was, they insisted they couldn&#8217;t help me. So I tried something. I asked the lady if I guessed the type of card, would she tell me if I was right? She agreed to do that, and I guessed that it was a Walmart Discover [...]]]></description>
				<content:encoded><![CDATA[<p>by <a title="David Plotkin" href="http://www.dataversity.net/contributors/david-plotkin/" target="_blank">David Plotkin</a></p>
<p>A few weeks ago, I had the amazing bad luck to become the victim of identity theft. This was, fortunately, a fairly minor issue; two different credit cards were applied for (and approved) in my name. The first came to my attention when I got a voice mail from Walmart because there was suspicious activity on a card &#8212; merchandise had been bought and sent to a delivery address in Idaho (more on that later). When I called back, I spoke to the order desk and verified that I had not made the purchase, and they agreed to cancel it. They then suggested that I contact my credit card company and cancel the card. So I asked them which card it was. Now here is where it gets interesting &#8212; they wouldn&#8217;t tell me! As I wondered aloud how on earth I could cancel the card if I didn&#8217;t know which one it was, they insisted they couldn&#8217;t help me. So I tried something. I asked the lady if I guessed the type of card, would she tell me if I was right? She agreed to do that, and I guessed that it was a Walmart Discover card. And how did I know that? I had recently gotten a notice from Discover that a card application in my wife&#8217;s name (she died in 2011) had been rejected. When Walmart person confirmed that I had guessed right, I then managed to get to customer service at the credit card company and cancel the card. Of course, they wanted the account number, which I didn&#8217;t have, but I got past that by insisting that they could find it by my Social Security Number. Having worked for a bank, I knew they would have that, and they did. But really? I had to guess? Talk about a lack of data &#8212; as well as a failure of business rules!</p>
<p>The second card was much easier &#8212; I got a letter in the mail congratulating me on my new Citibank card. Since I hadn&#8217;t applied for one, I called the number to find out what was going on. They verified that a card had been opened in my name, with my address that matched the last four digits that were printed on the letter. Somewhat more distressing, though, was the fact that the card had already been used to purchase thousands of dollars in airline tickets from Expedia. I, of course, had to wonder how that was possible, since I did not yet have the cards in my possession, nor had I activated them (you know &#8212; the sticky on the card with a number you have to call from your home phone). They admitted that they provided the new account number for immediate use to the applicant! Talk about another failure of business rules.</p>
<p>I contacted one of the Credit Reporting agencies (TransUnion, in this case) and placed a fraud alert on my credit. This is simple to do via phone or online, and is then disseminated to the other 2 agencies. If anyone queries your credit (as they will if you apply for a credit card), the agency will check with you that it is a valid credit application before responding to the query. A fraud alert expires after 90 days, but you can renew it. I figured that I might as well purchase an inexpensive plan which provides me a notification of any change in my credit history, as well as any inquiries. However, when I got my first report, I was in for a nasty shock. The account numbers listed did not show the last 4 digits. The problem is that if you look up your accounts online, all you SEE is the last 4 digits. You can, of course, get at the whole number, but it is far more trouble. So I called TransUnion to complain about how close to worthless not having the last 4 digits was, and they told me that the information was passed from the bank without that information, and there was nothing they could do about it. Sheesh. Is this the best we can do? Seems like it ought to be EASY to protect your credit, not intentionally made difficult. I decided to cancel the account because all I ever got was snailmail that told me generally that something had happened with a particular account. Since that information was worthless, I wasn&#8217;t going to pay for it. However, I didn&#8217;t have to cancel the account, as yet another credit card was compromised &#8212; the one I used to pay for that account. Since they could no longer charge the card, they canceled it for me.</p>
<p>The last part of this story occurred just recently, when I got a call from a detective in Boise, Idaho. They had executed a search warrant on the residence of a fraud suspect, and found some of my mail. By the contents we were able to date the mail as having been stolen in January of 2012, shortly before we realized we had a problem and put in locking mailboxes. But the stolen mail compromised more accounts because it included those stupid checks you can write against your credit card! Fortunately, none had ever been used. They hadn&#8217;t caught the suspect yet, but they were close, having run her out of three places. And, they have a case of mail fraud now as well &#8212; a federal offense. Hope she spends a nice long stretch in the slammer!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/if-you-send-me-the-wrong-data-im-hosed/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>When Business Rules Fail</title>
		<link>http://www.dataversity.net/when-business-rules-fail/</link>
		<comments>http://www.dataversity.net/when-business-rules-fail/#comments</comments>
		<pubDate>Wed, 14 Nov 2012 08:03:32 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Governance and Quality]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[David Plotkin]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=15911</guid>
		<description><![CDATA[by David Plotkin Business Rules are wonderful things, and I remember what an epiphany they were for me when Ron Ross and Barbara Von Halle began talking and writing about them in the early 90’s. Back then, I was a data modeler &#8212; specifically, a logical data modeler (physical would come later) &#8212; as well as a business analyst. My job was to gather requirements and create a logical data model that captured those requirements as a model. The problem, of course, was that I would hear a lot of requirements that I couldn’t accurately represent in the data model. For example, I might be told that a mandatory relationship must exist between LOAN and COLLATERAL only if the type of loan (such as a mortgage) required collateral. Further, if the type of loan did not require collateral, then a relationship must not exist between LOAN and COLLATERAL. So, I did what most of you probably did when faced with the same situation: I created an optional relationship between LOAN and COLLATERAL. Which, of course, didn’t get the requirement stated properly at all. The fact that business rules filled in this gap and provided an additional vehicle to capture requirements [...]]]></description>
				<content:encoded><![CDATA[<p>by <a href="http://www.dataversity.net/contributors/david-plotkin/" target="_blank">David Plotkin</a></p>
<p>Business Rules are wonderful things, and I remember what an epiphany they were for me when Ron Ross and Barbara Von Halle began talking and writing about them in the early 90’s.</p>
<p>Back then, I was a data modeler &#8212; specifically, a logical data modeler (physical would come later) &#8212; as well as a business analyst. My job was to gather requirements and create a logical data model that captured those requirements as a model. The problem, of course, was that I would hear a lot of requirements that I couldn’t accurately represent in the data model. For example, I might be told that a mandatory relationship must exist between LOAN and COLLATERAL only if the type of loan (such as a mortgage) required collateral. Further, if the type of loan did not require collateral, then a relationship must not exist between LOAN and COLLATERAL. So, I did what most of you probably did when faced with the same situation: I created an optional relationship between LOAN and COLLATERAL. Which, of course, didn’t get the requirement stated properly at all. The fact that business rules filled in this gap and provided an additional vehicle to capture requirements is what made them “speak” to me so clearly.</p>
<p>The key factor in getting business rules correct is that they need to account for all the “real world” situations that will be encountered. When they do not, the rules fail because a situation has been encountered in which the system either doesn’t “know” how to behave, or behaves in a way that was not intended. For example, say the rule discussed above was implemented by using a list of loan types for which collateral was necessary. If a new type of collateralized loan came along (e.g., a second mortgage line of credit), the system would NOT require collateral. And that would be WRONG.</p>
<p>The results of failed business rules can be hilarious, frustrating, and even dangerous. An example of frustrating (probably the most common type of failure) happened to me recently. I got to O’Hare airport in Chicago very early, as we had finished our work at the office. I signed up to try and catch an earlier flight home to San Francisco by getting on the standby list. The way this list works is that the airline tries to get you on an earlier flight, and if there is no room, the standby list rolls over to the next flight going to your destination. If I was unable to get on an earlier flight, I would go home on my confirmed flight, due to leave some 5 hours later. Meanwhile, my baggage would go on the first standby flight I was listed on (whether I went or not), and if I got on a later flight, it would be waiting for me at my destination airport.</p>
<p>It all seems pretty logical, right? What the airline seems to have forgotten when it made these rules are that flights are often delayed. In my case, the first standby flight was delayed due to mechanical troubles. I wouldn’t have gotten on it anyway, as there were two available seats and I was number 23 on the list. At any rate, once it became apparent that the first flight was not going to leave anytime soon, I walked down to where the next flight was due to leave about a half hour later. It was a fairly long walk, as it was 10 gates away. At any rate, when I got there, I found that the standby list had NOT rolled over to that flight. Apparently, since the earlier flight had not left, the event to trigger the list to roll over had not occured! The gate agent on the next flight did her best to “pull” the list so she could assign me a seat, but the system would not let her. Back I went to gate where the first flight was delayed. Once there (slightly out of breath, I’ll admit) I explained to the gate agent what was going on, so she released the list manually. Back I went to the second flight’s gate, where the gate agent had a boarding pass for me. It was actually good that I went to all that trouble, as the the second flight had lots of empty seats and all the standby passengers got on.</p>
<p>I think you can see what happened here. The trigger to roll over the list should have been that either the flight had left, or that it was delayed past the time that another flight would leave. The fact that the airline had not accounted for delayed flights caused the rule to fail. What has me scratching my head to this day is that this cannot have been the first time this situation had occured &#8212; in fact, it probably wasn’t the first time that day, given the on-time record I experienced during my flights back and forth. And yet everyone seemed surprised that list rollover hadn’t worked right. Perhaps a “business rule guy” had never analyzed it before.</p>
<p>Oh, and if you’re wondering what happened to my luggage&#8230;yep, it stayed on the delayed flight, arriving in San Francisco late that night, some 5 hours after I had arrived and gone home. It was delivered to my home the next day. I sort of understand that, trying to locate a piece of baggage among the thousands they must handle and then get it to the “right” plane would be a logistical nightmare that would undoubtedly lead to many more lost bags. Just keep this in mind the next time you want to take an earlier flight by going standby!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/when-business-rules-fail/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Data Quality Ain&#8217;t Lost</title>
		<link>http://www.dataversity.net/data-quality-aint-lost/</link>
		<comments>http://www.dataversity.net/data-quality-aint-lost/#comments</comments>
		<pubDate>Wed, 27 Jun 2012 07:10:57 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Governance and Quality]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[David Plotkin]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Information Quality]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=12202</guid>
		<description><![CDATA[by David Plotkin The famous American frontiersman Daniel Boone was once asked if he was lost. &#8220;No&#8221;, he replied, &#8220;lost means you don&#8217;t know where you are. I know where I am. It&#8217;s how to get to where I am going that has me a mite perplexed.&#8221; I thought about this definition of &#8220;lost&#8221; recently when a professional acquaintance complained that his company&#8217;s data used to be of good quality, but somehow that high-quality data was lost. Of course, in the sense that Daniel Boone used the word, it is rare that the quality of data is ever really &#8220;lost&#8221;. If you think about the business processes and how the data is used, you can usually figure out what happened to the quality. There aren&#8217;t really that many options for why good data goes bad: 1. It might be that nothing changed at all. This is true a surprising amount of the time. The data quality was fine for the purposes it was used for before, but it was insufficient for a new purpose. In the introduction to Danette McGilvray&#8217;s seminal book on Data Quality, there is a story about a pharmacy data system where all sorts of symbols were [...]]]></description>
				<content:encoded><![CDATA[<p>by <a title="David Plotkin" href="http://www.dataversity.net/contributors/david-plotkin" target="_blank">David Plotkin</a></p>
<p>The famous American frontiersman Daniel Boone was once asked if he was lost. &#8220;No&#8221;, he replied, &#8220;lost means you don&#8217;t know where you are. I know where I am. It&#8217;s how to get to where I am going that has me a mite perplexed.&#8221; I thought about this definition of &#8220;lost&#8221; recently when a professional acquaintance complained that his company&#8217;s data used to be of good quality, but somehow that high-quality data was lost.</p>
<p>Of course, in the sense that Daniel Boone used the word, it is rare that the quality of data is ever really &#8220;lost&#8221;. If you think about the business processes and how the data is used, you can usually figure out what happened to the quality. There aren&#8217;t really that many options for why good data goes bad:</p>
<p>1. It might be that nothing changed at all. This is true a surprising amount of the time. The data quality was fine for the purposes it was used for before, but it was insufficient for a new purpose. In the introduction to Danette McGilvray&#8217;s seminal book on Data Quality, there is a story about a pharmacy data system where all sorts of symbols were appended to the patient&#8217;s last name to indicate information such as the fact the patient had another insurance, or it was workman&#8217;s compensation coverage, or a whole host of other information that the aging pharmacy system had no fields for. This was not a problem because the data in the last name field wasn&#8217;t used for anything else. But then&#8230;the business process changed and the data was put to use for a new purpose. Specifically, the pharmacy began sending out refill reminders, using the patient&#8217;s name to generate the mailing labels. After the first batch went out, there was a flood of angry calls from people wondering why their name showed up with all sorts of symbols after the name.  And why do I know this story so well? Well, guess who had to write the program to strip all those characters off the names before the labels were generated?</p>
<p>2. The data might have started getting put in wrong. There are a whole host of reasons why this can happen. If the business suddenly starts putting a premium on speed of data input, speed is exactly what you are going to get. The people incented to be fast will figure out every shortcut, use defaults where possible, skip every field they can, and so on. As the old adage goes &#8212; be careful what you measure. Another problem can be training &#8212; if you suddenly bring on a new crew of people to do the work (for example, opening a new contact center with new employees) or start using a new application, people may not know HOW to enter quality data. This is especially true in companies where the QA on new applications is done hastily or with no attention paid to the quality of the screen layout and labeling, enforcement of business rules (did you even collect them?), and metadata rules, such as lists of valid values. Most people don&#8217;t climb out of bed in the morning saying &#8220;today I&#8217;m going to put in crappy data&#8221;, but you have to make it easy to do the right thing. This involves good application design and adequate training. And incenting for quality!</p>
<p>3. The problem may not actually be a &#8220;data quality&#8221; problem, but instead a &#8220;metadata quality&#8221; problem. The classic example is a term that &#8220;everyone knows the meaning of&#8221;. A friend told me a story about a group of BI developers spending more than a week trying to figure out why numbers reported by two different groups simply didn&#8217;t balance &#8212; in fact, weren&#8217;t even close. My friend even got both groups to define the term (which happened to be &#8220;transaction&#8221;) and both used the same definition. The problem? One group counted transactions that completed and for which a payment was successfully received. The other group counted every attempt to complete, including multiple declines by the credit card company. This failure of the &#8220;derivation rule&#8221; caused the wide variation. Ouch!</p>
<p>4. And yes, our friends in IT can occasionally bollox up the works by using the wrong source or scrambling an ETL (extract, transform, and load) job. But if changes are carefully designed, adequately tested (including by the business) and documented, this doesn&#8217;t happen too often. Again, this sort of thing should be caught during the testing phase if you are actually looking at the data. A bad mapping will likely show up as nonsensical results, incorrect implementation of business rules should show up as an error either during the application run or in the results. The key thing to remember is that IT is NOT responsible for the quality of the data. My friend Laura Cullen ran the Enterprise Data Warehouse for a big bank where I worked. One day one of my business peers asked her why the Warehouse didn&#8217;t deliver quality data. Laura replied that she would love to deliver quality, but she needed three things to do it. The first was that the business had to tell her what was meant by &#8220;quality&#8221; &#8212; in other words, a set of data quality rules. The second was a set of instructions on what she was to do with the data that didn&#8217;t meet those rules. For example, should she stop the load? Don&#8217;t write the bad records? Write out error messages? You get the idea. Finally, she needed funding to build the code for the rule engine that would enforce the data quality rules and detect where they were being violated. A very wise woman, my friend Laura.</p>
<p>And so, data quality is seldom lost. Of course, getting the quality to where you need it to go might seem a mite perplexing. But that&#8217;s another story.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/data-quality-aint-lost/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>When Slow Data is Bad Data</title>
		<link>http://www.dataversity.net/when-slow-data-is-bad-data/</link>
		<comments>http://www.dataversity.net/when-slow-data-is-bad-data/#comments</comments>
		<pubDate>Wed, 30 May 2012 10:06:51 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Governance and Quality]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[David Plotkin]]></category>
		<category><![CDATA[Discussion]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=11661</guid>
		<description><![CDATA[by David Plotkin In my new job, I travel quite a bit, often on short notice. This has the unfortunate side-effect of frequently leaving me sitting in a center seat, squashed between two people who should eat less and exercise more. So, when the opportunity arose during check-in to purchase an aisle seat in “Economy Plus”, I jumped at the chance to have a more comfortable 6 hour trip. Everything went fine at first. I was presented with a seat map which showed my existing seat and the empty ones. I picked the seat I wanted, provided my credit card information, and pressed “Enter”. Within a heartbeat, my email showed that I had received a receipt for the purchase. Apparently United’s accounting systems were really fast, though oddly, the transaction showed as “Continental” on the AMEX statement. Oh well, they did merge recently, so maybe Continental’s accounting system was what they kept, and someone just forgot to change the name. I just hope it doesn’t confuse MY company’s accounting system. Things started to go south when I returned to the seat map, and discovered that I was still assigned to that center seat. I refreshed the screen, logged off and [...]]]></description>
				<content:encoded><![CDATA[<p>by <a title="David Plotkin" href="http://www.dataversity.net/contributors/david-plotkin/">David Plotkin</a></p>
<p>In my new job, I travel quite a bit, often on short notice. This has the unfortunate side-effect of frequently leaving me sitting in a center seat, squashed between two people who should eat less and exercise more. So, when the opportunity arose during check-in to purchase an aisle seat in “Economy Plus”, I jumped at the chance to have a more comfortable 6 hour trip.</p>
<p>Everything went fine at first. I was presented with a seat map which showed my existing seat and the empty ones. I picked the seat I wanted, provided my credit card information, and pressed “Enter”. Within a heartbeat, my email showed that I had received a receipt for the purchase. Apparently United’s accounting systems were really fast, though oddly, the transaction showed as “Continental” on the AMEX statement. Oh well, they did merge recently, so maybe Continental’s accounting system was what they kept, and someone just forgot to change the name. I just hope it doesn’t confuse MY company’s accounting system.</p>
<p>Things started to go south when I returned to the seat map, and discovered that I was still assigned to that center seat. I refreshed the screen, logged off and logged back in again, but it would not show the correct seat assignment. Since the transaction receipt did not actually call out the seat number change, I began to wonder whether I’d actually gotten the seat I’d asked for. Next step: call the United Reservations desk.</p>
<p>Of course, they were experiencing “higher than normal call volume”, so I provided my confirmation “number” (which was all letters that sounded like one another), my last name, verified a few details by saying “yes”, and got dropped into the queue to wait. Not a big deal, I have a speaker phone and just went about my business while that theme song droned on in the background. Just to keep things interesting, the music would stop periodically, giving me the false hope that the call was finally being answered. Nope.</p>
<p>When someone did pick up, it turned out that all that information I had provided to the automated voice system had to be provided again, apparently it too is a silo. I explained my problem, and the reservation agent told me that I was still in that center seat according to his seat map. At this point I started to get a bit upset, so he did some more checking and found that on the reservation (which I couldn’t see online) it did indeed show that I was in the aisle seat I had worked so hard for. He assured me that when I checked in, the correct seat would be on the boarding pass. However, having lost faith in the process, I insisted he stay on the line while I checked in, and sure enough, the seat was correct and all was well with the world.</p>
<p>Those of you who remember my blog “<a href="http://www.dataversity.net/the-saga-of-the-safeway-sandwich/">The Saga of the Safeway Sandwich</a>” know that I like to dig into the costs of poor data quality. As I was waiting for the check-in to occur, I asked the agent whether he got a lot of calls from irate customers because the seat map didn’t update. He said that he did, and that some people shout at him about it (I did not, but I thought about it).</p>
<p>There are a whole lot of things going on here, things that are costing the airline money because the seat map doesn’t update when you change seats. Of course, they need more people to staff the phones, not even including the extra time it takes to repeat information which has already been provided. It also can’t be easy to be a reservations agent, and having people shout at you for something that wasn’t your fault doesn’t make it any easier. I will forgo comments on whether it matters that the customer was inconvenienced. It does to some businesses, not to others – and I’ll let it go at that. I’m sure you have your own opinions.</p>
<p>And you know, if the seat map doesn’t update the next time, I’m going to call again, because I’m not a very trusting individual.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/when-slow-data-is-bad-data/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>It has to be quick, too.</title>
		<link>http://www.dataversity.net/it-has-to-be-quick-too/</link>
		<comments>http://www.dataversity.net/it-has-to-be-quick-too/#comments</comments>
		<pubDate>Mon, 16 Apr 2012 07:01:21 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Governance and Quality]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[David Plotkin]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Information Quality]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=10584</guid>
		<description><![CDATA[by David Plotkin I recently joined a new company, and as with all such endeavors, there was a really long list of things I had to do to get everything set up in a variety of systems. You know the stuff – HR used to do it for us, but then someone figured out that they could dispense with HR and make us new-hires do it for ourselves. Don’t get me started. At any rate, the VERY first thing was to get a corporate credit card, as I’d be traveling and company policy demands that we charge all business expenses to the card. So I dutifully went down the list of things I needed to do to apply for the card, and by Friday morning (Thursday was my first day) I had submitted everything with a “rush”. Long about Wednesday, I still hadn’t gotten the confirmation email from AMEX telling me that they were processing the application, and therein hangs a tale. “Timeliness” is one of those Data Quality dimensions that I haven’t paid a whole lot of attention to. In fact, those of you who have worked with me may have heard me say “I’d rather have it right [...]]]></description>
				<content:encoded><![CDATA[<p>by <a title="David Plotkin" href="http://www.dataversity.net/contributors/david-plotkin" target="_blank">David Plotkin</a></p>
<p>I recently joined a new company, and as with all such endeavors, there was a really long list of things I had to do to get everything set up in a variety of systems. You know the stuff – HR used to do it for us, but then someone figured out that they could dispense with HR and make us new-hires do it for ourselves. Don’t get me started.</p>
<p>At any rate, the VERY first thing was to get a corporate credit card, as I’d be traveling and company policy demands that we charge all business expenses to the card. So I dutifully went down the list of things I needed to do to apply for the card, and by Friday morning (Thursday was my first day) I had submitted everything with a “rush”. Long about Wednesday, I still hadn’t gotten the confirmation email from AMEX telling me that they were processing the application, and therein hangs a tale.</p>
<p>“Timeliness” is one of those Data Quality dimensions that I haven’t paid a whole lot of attention to. In fact, those of you who have worked with me may have heard me say “I’d rather have it right than fast”. Of course, at SOME point you need the data, and if you haven’t got it, then the data quality has failed in that dimension. Most of us probably think of these failures in terms of daily jobs that take more than 24 hours to run, or failed updates to the data warehouse, or some such. But Timeliness depends on the business requirement, and in this case, the requirement is the ability to issue a corporate credit card within a very narrow time frame.</p>
<p>So, on Wednesday, I got busy and hunted down the “Program Administrator” who was supposed to authorize the issuing of the card to find out what was going on. As it turned out, the reason it had not been approved was pretty simple: when she looked up my employee id in the list that she gets from HR <em>once a week</em>, it wasn’t there. I can see all of you out there nodding your heads, just like I did. At any rate, not noticing that it was a rush, she simply put it aside until she got the next list from HR, which I presume would have had my id on it. When I pointed out that it was a rush, she executed a “manual process”, found my id, and approved issuing the card, which arrived just in the nick of time.</p>
<p>So clearly the data failed the Timeliness dimension. Notice that it wasn’t that HR didn’t have my info, <em>they did</em>. It was just that it wasn’t on her report. And, as with so many data quality problems (most of them, in fact), the failure was indicative of a broken process. In this case, her process didn’t account for the business scenario where an employee starts mid-week and needs a credit card right away. What has me scratching my head is this has to happen on a fairly frequent basis, and yet she has not adjusted the process (or asked someone else to) to adapt. Clearly she has a “manual process” to look someone up (probably in the corporate directory), but it undoubtedly takes more time to do it. Still, as someone who almost ended up paying all the expenses myself (its much harder to get reimbursed that way than if you use the corporate credit card), I think we can do better. Of course, I gently suggested that.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/it-has-to-be-quick-too/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Designing in Data Quality with the User Interface</title>
		<link>http://www.dataversity.net/designing-in-data-quality-with-the-user-interface/</link>
		<comments>http://www.dataversity.net/designing-in-data-quality-with-the-user-interface/#comments</comments>
		<pubDate>Wed, 07 Mar 2012 08:01:01 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Governance and Quality]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[David Plotkin]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Information Quality]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=9640</guid>
		<description><![CDATA[by David Plotkin Larry English recommends “designing in” data quality into new systems, and I agree that this is very important. One of the ways you can accomplish “designing in” of data quality is to work closely with the User Interface team. This is even more important when multiple new systems (or major system upgrades) are being designed at the same time, as the opportunity for achieving consistency in the user interfaces then exists. In the real-world experience discussed here, we were designing two major new systems simultaneously. The User Interface team (not surprisingly) is charged with designing the user interface for the systems. Of course, they wanted to ensure that the user interfaces were consistent across the two systems, including field lengths, labels, types (e.g., text field, radio button, drop-down list, etc) and color schemes. As you can probably imagine, having this sort of consistency from screen to screen and from system to system helps make the entry of data more accurate. For example, both systems collected customer information and displayed that information in blocks on various screens. The same color scheme was used everywhere that customer information was referenced so that the data entry personnel could easily find [...]]]></description>
				<content:encoded><![CDATA[<p>by <a title="David Plotkin" href="http://www.dataversity.net/contributors/david-plotkin">David Plotkin</a></p>
<p>Larry English recommends “designing in” data quality into new systems, and I agree that this is very important. One of the ways you can accomplish “designing in” of data quality is to work closely with the User Interface team. This is even more important when multiple new systems (or major system upgrades) are being designed at the same time, as the opportunity for achieving consistency in the user interfaces then exists. In the real-world experience discussed here, we were designing two major new systems simultaneously.</p>
<p>The User Interface team (not surprisingly) is charged with designing the user interface for the systems. Of course, they wanted to ensure that the user interfaces were consistent across the two systems, including field lengths, labels, types (e.g., text field, radio button, drop-down list, etc) and color schemes. As you can probably imagine, having this sort of consistency from screen to screen and from system to system helps make the entry of data more accurate. For example, both systems collected customer information and displayed that information in blocks on various screens. The same color scheme was used everywhere that customer information was referenced so that the data entry personnel could easily find that information on the screen. Field lengths were also standardized – street address was a character field of a given length everywhere it appeared, including customer address, shipping address, garaging address, and so on. Finally, information was captured using consistent field types whenever possible. For example, the preferred communication method was always a set of check boxes so the customer could specify multiple methods if they so wished.</p>
<p>We (the Data Governance team) first met up with the folks on the User Interface (UI) team in meetings to review the screen designs. We soon realized that we had much the same goals in mind, and that we could help each other. The UI team was struggling with the following issues:</p>
<ul>
<li>What field labels should be used on the screens and how could they be kept consistent?</li>
<li>What is the best type of field to use to gather the needed information, especially when the information was considered mandatory? And how do we keep track of decisions about what field types to use?</li>
<li>For fields with limited value sets, what should those value sets be?</li>
<li>What field definition (to explain the contents of the field) should be available and who will supply that definition?</li>
</ul>
<p>The Data Governance organization and our business (metadata) glossary could help in many ways.</p>
<p><strong>Field labels, we got field labels</strong></p>
<p>For field labels, we started with the standardized business names in the glossary, and then went through a process to create a screen field name. We normally didn’t use the data element business name as-is because it was often too long. To shorten it, we used entries from the standard abbreviation list. From there, the UI team used the screen context to devise the actual label to use. What does that mean? Basically, the data on a screen has a context that isn’t present in the business glossary. For example, in the business glossary, we have terms such as “Customer First Name”, “Customer Last Name”, “Customer Street Address”, and so on. But on the screen, there is often a box labeled “Customer” which contains these fields. Thus, it is not necessary to have the word “Customer” in front of every field! But since screen design determines context, the UI team did the final pass to set the field names.</p>
<p><strong>Field types and lengths</strong></p>
<p>Standardization on the field types (e.g., character, drop-down list, radio button, check box, etc.) for different data elements is another important UI consideration. One of the issues that the UI team faced was how to remember what had been decided on in each instance. When they brought that concern to us, the answer was obvious because we’re metadata people. We added fields to our business glossary to document the field type, standardized length, and other field properties, such as patterns, precision, and logical data type (integer, y/n, date, character, etc.). Thus, anytime such a field needed to be added to a screen, the UI team could look it up in the glossary to see what the UI characteristics should be. Data Governance didn’t enforce this, but the UI team was very good at self-policing and appreciated our help and having a tool to record this information.</p>
<p>We got into some interesting discussions on how best to ensure that data was entered when it was considered mandatory. The initial approach was to provide a default so that a value was always available, but having participated in data quality efforts many times, I knew what the result would be – a preponderance of values that corresponded to the default. As a result, we prevailed on the UI team to NOT provide a default, but instead to force the data entry personnel to enter a value.</p>
<p><strong>And speaking of values…</strong></p>
<p>Quite a few of the fields on the screens had limited sets of values, and Data Governance played a key role in providing the values that should be used. We felt it was important that the data entry personnel see values that were identical to those determined to be correct by the data stewards. The UI team was understandably willing to take advantage of all the pre-work we had done to track the values down and document them. This also ensured that the value lists would be standard across the applications.</p>
<p><strong>Definitions</strong></p>
<p>We also tied the definitions that popped up in tooltips to those in the business glossary. The glossary is our “system of record” for definitions, so ideally those definitions should be presented to the business people. One small issue arose in that some of our definitions are quite long and involved, so in some cases we had to create a shorter definition for use on the screens. But Data Governance managed those as well, and kept them in the business glossary.</p>
<p><strong>And don’t forget search…</strong></p>
<p>One of the more vexing problems was how to get the data entry personnel to locate an existing customer using the Search functionality, rather than simply entering the data again. In the old system, Search often returned so many hits that it was just easier to create a duplicate customer record – which leads to all kinds of bad things, discussed elsewhere in this series. Data Governance worked with the UI team to define:</p>
<ul>
<li>What fields made sense to search on. We knew not only which fields we had available, but also (due to data profiling) which fields had quality that was good enough to locate the right customer. Having implemented a Master Customer effort, we also knew which fields were used (in combination) to figure out who was who – the “identifying attributes”.</li>
<li>The rules under which, if a data entry person created a new customer anyway, the Search would take place again and force choosing of an existing customer due to a match that exceeded a reasonable threshold. Again, this work was considerably aided by the work done in Customer Master, in which the customer data was exhaustively analyzed for quality and completeness.</li>
<li>What to do if the user managed to create a duplicate anyway (they ARE clever). This would be detected by the matching engine, the appropriate adjustments made, and a note sent to the supervisor to point out a “training opportunity” for employee. Constant reinforcement of the rules – coupled with the fact that duplicate customer records did NOT contribute to the employee’s total (and thus they weren’t paid for that work) reduced the number of deliberately-created duplicates to a trickle.</li>
</ul>
<p><strong>Summing Up</strong></p>
<p>UI and Data Governance provide a powerful combination. While Data Governance understands how data quality can be degraded by a poorly-designed interface, the UI designers know how to turn that information into an interface that safeguards the quality. And by the way, it feels good to provide the metadata to a group that values it and will use it to increase their productivity and enforce standardization.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/designing-in-data-quality-with-the-user-interface/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Saga of the Safeway Sandwich</title>
		<link>http://www.dataversity.net/the-saga-of-the-safeway-sandwich/</link>
		<comments>http://www.dataversity.net/the-saga-of-the-safeway-sandwich/#comments</comments>
		<pubDate>Sun, 23 Oct 2011 21:49:31 +0000</pubDate>
		<dc:creator>Shannon Kempe</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Governance and Quality]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[David Plotkin]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Information Quality]]></category>
		<category><![CDATA[data governance]]></category>
		<category><![CDATA[data quality]]></category>
		<category><![CDATA[data stewarship]]></category>
		<category><![CDATA[david plotkin]]></category>
		<category><![CDATA[information quality]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=6443</guid>
		<description><![CDATA[by David Plotkin The Saga of the Safeway Sandwich I’ve had a few epiphanies in my career – sudden realizations that changed or significantly reinforced my viewpoints. One in particular occurred while attending a tutorial at Enterprise Data World presented by my good friend Bob Seiner of KIK Consulting. Bob knows a LOT about data governance, data stewardship, and data quality. In this particular case, he was talking about how many of the data quality issues we run into are caused by a disconnect between “data producers” and “data consumers”. In a nutshell, the people who provide or input the data don’t understand (or don’t care) what the data will be used for, and thus the either the wrong data is collected or the right data is not collected. There are many reasons for this – the data producers weren’t told what “their” data would be used for, weren’t trained properly, or are just incented to be fast but not accurate.  And it can be hard to get that behavior changed, because management doesn’t understand the issue, or may ALSO be incented incorrectly. The key can often be to find the right level of management – people who have reasons [...]]]></description>
				<content:encoded><![CDATA[<p>by <a title="David Plotkin" href="http://www.dataversity.net/contributors/david-plotkin">David Plotkin</a></p>
<p><strong>The Saga of the Safeway Sandwich</strong></p>
<p>I’ve had a few epiphanies in my career – sudden realizations that changed or significantly reinforced my viewpoints. One in particular occurred while attending a tutorial at Enterprise Data World presented by my good friend Bob Seiner of KIK Consulting. Bob knows a LOT about data governance, data stewardship, and data quality. In this particular case, he was talking about how many of the data quality issues we run into are caused by a disconnect between “data producers” and “data consumers”. In a nutshell, the people who provide or input the data don’t understand (or don’t care) what the data will be used for, and thus the either the wrong data is collected or the right data is not collected. There are many reasons for this – the data producers weren’t told what “their” data would be used for, weren’t trained properly, or are just incented to be fast but not accurate.  And it can be hard to get that behavior changed, because management doesn’t understand the issue, or may ALSO be incented incorrectly. The key can often be to find the right level of management – people who have reasons to need accurate data.</p>
<p>This disconnect between data producers and data consumers really rang true for me. After spending time at a pharmacy retailer, a large bank, and an insurance company, I’ve seen this behavior, and Bob is right:  it is very often the reason why data quality suffers.</p>
<p><strong>Its Lunch time!</strong></p>
<p>The disconnect between data producers and data consumers was brought home to me last Friday, when I stopped by my local Safeway market to get a sandwich. Every Friday the deli makes up a bunch of really huge submarine sandwiches, stuffed full of meat, cheese, and veggies. They wrap the sandwich in cellophane and slap a bar-coded label on it to make it easy to purchase. It’s a good deal, and I expect they sell quite a few of them.</p>
<p>The trouble started when I got to the checkout stand with my lunch. Try as she might, the checker could not get the scanner to read the label, as it was largely obscured during the wrapping process. She couldn’t read the bar code numbers (which allow for hand-input of unscannable items) because those were hidden too. As the lunchtime crowd backed up behind me, she called out over the intercom for the manager to come over with a special key needed to override the scanner so she could put the price in by hand. As we waited, she commented that this was the fourth sandwich today (out of five) which would not scan. The checker at the next station allowed as how she was having the same problem as well.</p>
<p>After about a minute, the manager showed up and I was able to purchase my sandwich. Since I didn’t have to be anywhere special (and maybe because I don’t HAVE a life), I told the manager the story – how the sandwich wouldn’t scan, causing delays and annoyance to the customers. He responded that it was no big deal that one sandwich out of the 150 or so they expected to sell hadn’t scanned (clearly not caring that it was a big deal to ME and the people who had to wait). At that point the checker responded that most of them had this problem, but that didn’t seem to impress him either. All he seemed to care about what that the sandwiches got made so they could be sold and contribute to overall store sales – which impacted his bonus. The light was beginning to dawn.</p>
<p><strong>Talking to the Data Consumer</strong></p>
<p>The checker line had disappeared and she was idle for a moment. I questioned her about the situation and found out that:</p>
<p>-         This was an ongoing problem which had been commented on by all the checkers.<br />
-         The number of sandwiches sold seemed to be dropping off.<br />
-         The number of people complaining that there were no sandwiches to buy seemed to be increasing.<br />
-         The store manager was aware but wasn’t concerned.</p>
<p>I asked her if anyone had spoken to the Deli manager about it, and she didn’t think so. I guessed that he probably wouldn’t care anyway, as he just wanted to make lots of sandwiches to sell. “Oh no” she replied “he would very much care because he gets credit for each sandwich sold”. AHA!</p>
<p><strong>Talking to the Data Producer</strong></p>
<p>The Deli manager just happened to be going on break, so I told him about the experience I’d just had, and the comments from the checkers. You could see the light beginning to dawn. “Well, that explains a lot” he said. He told me how, even though the sandwiches all seemed to disappear, the store records showed that they hadn’t sold very many. As a result, they cut way back on the number they made so they wouldn’t go to waste. The issue, of course, was that when the checkers used the override key, the transaction was not logged properly, simply showing up as “miscellaneous”. Thus, the counts were off, the deli manager wasn’t properly compensated, and there wasn’t enough product to satisfy the customers. Oh yeah – and the hours of one of the deli employees was cut back because they didn’t need as many sandwiches. All because of bad data and the disconnect between the data producers and the data consumers.</p>
<p>Fixing the problem turned out to be a matter of making the data producer aware of it. This worked because the data producer was incented to get the data right – if the sandwich didn’t scan, he didn’t get credit for it. Of course, if the data producer got credit based on the count of sandwiches MADE rather than the count of sandwiches SOLD, he probably wouldn’t have cared either, and then the only recourse might be to justify improving the data to improve the customer and employee experience. This would have been a matter for the store manager, who, as I said earlier, didn’t seem to care.</p>
<p>I guess I could have found a different place to buy my lunch, but that seemed a bit extreme.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/the-saga-of-the-safeway-sandwich/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Who is the Customer?</title>
		<link>http://www.dataversity.net/who-is-the-customer/</link>
		<comments>http://www.dataversity.net/who-is-the-customer/#comments</comments>
		<pubDate>Thu, 04 Aug 2011 15:13:39 +0000</pubDate>
		<dc:creator>David-Plotkin</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[David Plotkin]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>
		<category><![CDATA[Information Quality]]></category>
		<category><![CDATA[Master Data Management]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=4880</guid>
		<description><![CDATA[&#160; &#160; &#160; &#160; A Fable with a moral by David Plotkin Once upon a time, there was a company. This company had a terrible time recognizing their customers and understanding what products those customers had. The reasons for this weren&#8217;t hard to understand.  The company&#8217;s main business was selling insurance, and each type of insurance was sold using a different computer system. That is, if you purchased an automobile policy, a homeowner&#8217;s policy, a personal liability policy, and a road rescue policy, your name and pertinent information would appear in four different systems.  Each of these systems required only the data necessary to sell that product, and was completely unaware of any overriding needs of the Enterprise to identify the customers. That is, the systems were completely product-centric. For example, while the auto policy system required a driver license and a birthdate for the policyholder, the homeowner policy system did not. Actually, the homeowner system allowed for a birthdate, but typically this was filled in using the birthdate of the oldest person in the household (not necessarily the policyholder) to trigger the senior discount. Each system also had its own rules about how data was stored, and what validations [...]]]></description>
				<content:encoded><![CDATA[<p><strong><a href="http://www.dataversity.net/wp-content/uploads/2011/04/D-Plotkin22.jpg"><img class="alignleft size-thumbnail wp-image-3167" src="http://www.dataversity.net/wp-content/uploads/2011/04/D-Plotkin22-150x150.jpg" alt="" width="150" height="150" /></a></strong></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong>A Fable with a moral</strong></p>
<p>by <a href="http://www.dataversity.net/contributors/david-plotkin">David Plotkin</a></p>
<p>Once upon a time, there was a company. This company had a terrible time recognizing their customers and understanding what products those customers had. The reasons for this weren&#8217;t hard to understand.  The company&#8217;s main business was selling insurance, and each type of insurance was sold using a different computer system. That is, if you purchased an automobile policy, a homeowner&#8217;s policy, a personal liability policy, and a road rescue policy, your name and pertinent information would appear in four different systems.  Each of these systems required only the data necessary to sell that product, and was completely unaware of any overriding needs of the Enterprise to identify the customers. That is, the systems were completely product-centric. For example, while the auto policy system required a driver license and a birthdate for the policyholder, the homeowner policy system did not. Actually, the homeowner system allowed for a birthdate, but typically this was filled in using the birthdate of the oldest person in the household (not necessarily the policyholder) to trigger the senior discount. Each system also had its own rules about how data was stored, and what validations were done on that data. For example, the homeowner policy system would trigger an error if multiple active policies were written against a specific address. But there was no address validation, so the address had to be an exact match. Change a single abbreviation (e.g., from &#8220;Lane&#8221; to &#8220;Ln&#8221;) and the validation went through just fine. The auto policy did require a driver&#8217;s license number, but did not check to see if if was unique (or nearly so). Thus, the same driver license number could be (and was) used for hundreds of policies, as that was faster and easier than waiting while the potential customer looked up their number, or delaying until the customers received a driver license from the state into which they had just relocated.</p>
<p>Note: This last little bit of business wreaked considerable havoc while it went undiscovered. Driver license number is used to look up moving violations for rating the policy, and the data is &#8220;enriched&#8221; by obtaining a feed from the DMV (based on driver license number, naturally enough). Suddenly, this enriched data on hundreds of policies came back with the same driver information, as one of the &#8220;dummy&#8221; license numbers commonly used to write the policies happened to correspond to a real person. It was discovered as a result of the effort to uniquely identify customer (&#8220;Customer Master&#8221;).</p>
<p><strong>Who is the customer?</strong></p>
<p>The inability to understand who the customer was, what products the customer had, and what the customer needed led to considerable confusion and what the CRM manager at this company termed &#8220;a disastrous customer experience&#8221;. Oh, and it wasn&#8217;t much fun for the employees, either. For example the customer might call in to provide a change of address. The person taking the call would actually have to ask the customer what other products the customer had so the rep could adjust the addresses in each of the systems that contained it. Additionally, an insurance agent might call a customers that had an auto policy to try to sell a personal liability or homeowners policy. Imagine the confusion and consternation when the customer informed the agent that he or she already had such a policy with the same company! Feedback from the customer base suggests strongly that not only did the customers feel it was reasonable to expect the agents to have this information, but made them wonder how well a company that did such a poor job of keeping track of things would perform when it came time to process a claim! This probably led to lost business, though this is just one of the impacts of data quality that is hard to quantify.</p>
<p>What was NOT hard to quantify was the lost business that resulted from the inability to tell whether an insured was also a member of the organization. You see, one of the requirements for someone to buy insurance from us was that they needed to be a &#8220;member&#8221;, which involved paying some additional money, and receiving some additional benefits (including the ability to buy our insurance). What would happen is that at renewal, a check would be made to see if the person was still a member. If not, they would be contacted to reenroll them, and if that effort failed, the insurance could not be renewed and a customer was lost. However, very often the person DID in fact still have an active membership, but the lack of a &#8220;master customer&#8221; hid that fact. But the customer knew they had an active membership, and either didn&#8217;t respond to such an obvious error or decided (again), that a company that couldn&#8217;t keep track of its customers didn&#8217;t deserve their business. Its pretty easy to count the number of policies that lapsed due to &#8220;non-membership&#8221; and apply some factors to the value of that lost business.  Another troubling aspect was that, in order to not lose business, the customer might be given a free membership &#8212; so now quite a few people had TWO. When they were then billed for both, they only paid for one (because there is no reason to have two) and allowed the other one to lapse. This tendency skewed metrics like retention and renewal rates. And just to add insult to injury, we conducted periodic efforts to get these &#8220;lapsed&#8221; members to renew. The metrics around the success rates of these efforts are also skewed by the fact that a significant percentage of those in the target audience already had an active membership.</p>
<p><strong>Issues with Building the Master Customer List</strong></p>
<p>The need to develop a complete and accurate picture of the customer led initially to an effort to develop a master customer list. This list was culled from the various systems that collect customer information, and additionally linked each master version of the customer to all the products the customer owned. However, the ability to accurately identify that the various versions of a customer stored in siloed systems were in fact the same person depends heavily on having accurate data. Despite a significant effort to create a robust architecture, use of a highly-rated probabilistic matching engine, and months of tuning the results, we were not able to get above about 70% automated good matches, and there were too many false matches as well.  This situation caught me by surprise, as I had previously had stunning success with this sort of matching when I implemented it in the pharmacy of the large chain drugstore.  There turned out to be two key differences between that effort and the current one which explained the huge difference in results (at the chain drugstore, we got essentially 100% accurate matches and no false linkages).</p>
<p>The first difference was that at the pharmacy we weren&#8217;t matching data from different systems. Instead, we were matching patient data from different stores, with the idea of creating &#8220;central patient&#8221; &#8212; a master patient with a complete drug history that could walk into any store in the chain and be recognized. Previously, if a patient went into a store they hadn&#8217;t been in before, the pharmacy personnel had to take their demographic and insurance information, get a list of drugs they were taking (for drug-drug interactions), and so on. Since each store was running the same software, it was possible to easily map the data from the individual stores into a master database with no assumptions about the meaning of the data or how it was derived. But in the case of the insurance master customer, the data in each system had to be examined, meanings figured out, derivations examined, and then mapped together to get the &#8220;same&#8221; fields from each system so that the match could be undertaken.  Unlike the pharmacy system, each insurance system had its own assumptions. For example, one system had a preponderance of birthdates on 12/31 (we profiled the data to find this out). This turned out to be a vestige of an earlier conversion, and thus we could trust only the year portion of the birthdate when doing matches if the birthdate was 12/31.  Another system had a birthdate field, but it turned out (as mentioned earlier) not to contain the birthdate of the policyholder, but of the oldest person in the household.</p>
<p><strong>Its all about the data quality</strong></p>
<p>The major difference, however, turned out to be the quality of the data. The quality of the data in the pharmacy was really, really good. Names, birthdates, addresses, phone numbers, gender, and medical insurance id were spot-on accurate. Even in central California, where there were literally thousands of individuals named Maria Garcia, we were able to match them up and get them right. And the reason that the data was really, really good (high quality)? Simple &#8212; the business provided a powerful incentive to get it right. Unlike many businesses, where simply filling in anything is enough to close the transaction, in the pharmacy the data HAD to be right or the transaction wouldn&#8217;t go through. Over 95% of the customers were covered by drug coverage insurance. If any of the data was incorrect, the transaction would be rejected by the drug coverage company. It was a Data Quality practitioner&#8217;s dream &#8212; the data had to be perfect before the transaction could take place. Further, pharmacy personnel, who are incented partially by how many prescriptions are filled and sold, have an incentive to collect the correct data and update the system. Even the customer, who would really rather not pay cash, has an incentive to provide the correct data so they can get their medicine paid for.</p>
<p>Compare that situation with almost any other business. As I said earlier, most data can be collected incorrectly, or at least never updated, and the transaction goes through just fine. The customer-facing personnel are usually incented to be fast (remember the adage: &#8220;be careful what you pay for&#8221;) but not accurate. The systems they use are often aging and have very little data validation built in (this was true of the pharmacy system too, but it didn&#8217;t matter). And even if you change all that, the customers are often leery of providing personal information. When we discovered the issue with 12/31 birthdates, we tried to clean the data up by calling the customers to get their birthdates. In lots of cases, they refused to provide that information over the phone, some agreed to do it by letter, but we never got most of them. Of course, in the case where the birthdate is needed to write a policy, they&#8217;ll at least provide that information at renewal, as they expect to do that. But if the information is needed only so you can do customer matching, we have little luck getting the customers to provide that. For example, we don&#8217;t need your birthdate for a membership (unless you&#8217;re a minor), so if we suddenly start asking for it, people get suspicious.</p>
<p>Of course, a determined team of people can make headway, so don&#8217;t get the idea we gave up in frustration. Addresses can be standardized and scrubbed. Names can be parsed (often names are combined in a single field with prefixes and suffixes), key phrases recognized and removed (such as &#8220;trustee&#8221;), and information enriched from outside sources based on addresses. Business processes can be changed to collect data even in cases where it isn&#8217;t needed for that transaction &#8212; and incentives changed to incent people to collect the data correctly. Real-time validation of the data (Driver License number, address, etc.) can be put in place, and where overrides are allowed, the number of overrides tracked to see if anyone is abusing the privilege. Little by little we got ahead of it, with our automated matches edging up into the mid-90&#8242;s and then the high-90&#8242;s, and false linkages dropping off sharply.</p>
<p>Finally, the stuff we learned was built into the new systems we built, so that data quality was protected as much as possible at entry. That is, we &#8220;designed in&#8221; data quality. And isn&#8217;t that the best way?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/who-is-the-customer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Metadata: Keeping track of lists of values</title>
		<link>http://www.dataversity.net/metadata-keeping-track-of-lists-of-values/</link>
		<comments>http://www.dataversity.net/metadata-keeping-track-of-lists-of-values/#comments</comments>
		<pubDate>Sat, 28 May 2011 15:41:40 +0000</pubDate>
		<dc:creator>David-Plotkin</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Governance and Quality]]></category>
		<category><![CDATA[Data Integration]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[David Plotkin]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>
		<category><![CDATA[Metadata]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=3616</guid>
		<description><![CDATA[by David Plotkin As you build metadata expertise, document definitions in your metadata repository (or whatever tool you use) and gain credibility in the enterprise, you are likely to find that your effort includes gathering, stewarding, documenting, and providing lists of valid values (sometimes called “enumeration lists”). After all, having standardization for your name suffixes, gender, marital status, relationship type, and so on ensures that new systems (and existing systems that can handle the modification) will use the same values. This leads to standardization on reports and in the data warehouse as well. I think most would agree that items like gender code should be tracked by Data Governance (or whatever you call it) and even consider the Metadata Repository the “system of record” for such value lists. These lists have wide usage, a short list of values that don’t (or shouldn’t) change often, and no real clear owner or system of record. But how do you draw the line between “true” valid lists of values and the vast number of data elements that happen to have a finite list of values, but should be neither governed by Data Governance nor should be documented in the Metadata Repository? Items that potentially fall [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.dataversity.net/wp-content/uploads/2011/04/D-Plotkin22.jpg"><img class="alignleft size-thumbnail wp-image-3167" src="http://www.dataversity.net/wp-content/uploads/2011/04/D-Plotkin22-150x150.jpg" alt="" width="150" height="150" /></a></p>
<p>by <a title="David Plotkin" href="http://www.dataversity.net/?page_id=1058" target="_blank">David Plotkin</a></p>
<p>As you build metadata expertise, document definitions in your metadata repository (or whatever tool you use) and gain credibility in the enterprise, you are likely to find that your effort includes gathering, stewarding, documenting, and providing lists of valid values (sometimes called “enumeration lists”). After all, having standardization for your name suffixes, gender, marital status, relationship type, and so on ensures that new systems (and existing systems that can handle the modification) will use the same values. This leads to standardization on reports and in the data warehouse as well.</p>
<p>I think most would agree that items like gender code should be tracked by Data Governance (or whatever you call it) and even consider the Metadata Repository the “system of record” for such value lists. These lists have wide usage, a short list of values that don’t (or shouldn’t) change often, and no real clear owner or system of record.</p>
<p>But how do you draw the line between “true” valid lists of values and the vast number of data elements that happen to have a finite list of values, but should be neither governed by Data Governance nor should be documented in the Metadata Repository? Items that potentially fall into this category are things like GL Account codes, office location codes, sale rep identifiers, and even Employee Ids. And trust me, you do need to draw that line, as analysts and project team members will start asking Data Governance for this information once they understand what we do. After all, its easier than trying to dig this stuff out for themselves! <strong>The key differentiator seems to be that data elements that are created as part of a common business process with a clear business function owner should NOT be part of the Data Governance deliverables</strong>. All of these examples fall into that category. For example, HR creates (and terminates) Employee Ids as part of the hiring and termination process. They are rightfully in control of that process, and the value set changes daily (and even continuously). No one with any sense would suggest that this constantly fluctuating value set belongs in the Metadata repository, or that the Metadata repository should be the system of record. There is a clearly defined system of record — the HR system which uses these values to do the processing required for employees — such as establishing their managerial structure, setting their service date, getting them paid, tracking their taxes and withholding, disciplinary actions, change of status, location, and so on. The same can be said of the other examples noted.</p>
<p>Note that this doesn’t imply that the maintenance/add/removal of the values is limited to the system of record or is simple to administer. Adding a Sales Rep Id, for example, involves not only adding it to the Sales system (probably the system of record), but the HR/Compensation system, establishing the location (which can change from day to day or even hour to hour), and so on. They key, as I said, is that a common business practice with a business function owner owns this process, and the system of record is highly likely to be the main system in which the value set is adjusted initially (with propagation as necessary) and which cannot function properly without having the most up-to-date list of these values.</p>
<p>A key point here is that many times, true “valid values” (such as gender code) don’t have a well-defined system of record. You might make a case that Gender Code is “people data” and thus owned by the HR business function (and so the system of record should be the HR system). But what about all the people the Enterprise deals with who are not employees or contractors? Customers, suppliers, external agents, etc. “Solving” this by putting the data element into a generalized function like Customer Master (with a domain data steward) establishes ownership but does nothing for solving the issue of the SOR for these data elements. Most of the time, these values are used so generally across the enterprise that it is prudent to have an agreed-upon list documented in an easy-to-find place. The list of potential values is so small that it is reasonable and convenient to record and maintain the list in the Metadata repository, though it must be implemented identically (good luck with that) in every system which contains the data element.</p>
<p>To tell the truth, this whole discussion came up because of Product, and whether there should be a list of product codes supplied by Data Governance and kept for reference in the Metadata Repository. I have to admit, my initial inclination was to specify the codes and keep them in the repository, though <strong>not</strong> as the system of record. After all, we only have about 25 products, and we don’t add a lot of new ones very fast, since it takes a major effort to do that. And to reiterate, the system of record(s) has to be the product systems themselves because that is where you need to fully define the product in order for the system to work correctly (and enable you to sell the product). The fact that a bunch of other systems have to get major updates as well is more a failing of the integration and system design than anything else.</p>
<p>I have since changed my mind about even keeping product in the Metadata repository. To see the apparent insanity of recording the list of products in the Metadata Repository, generalize to take the example of a major retailer, such as Longs Drugs, where I worked for 7 years. Longs (now CVS) has a well-defined product hierarchy, each level is clearly specified and ALL products must fit into the hiearchy and populate under values all the way to the top. The hierarchy is a pyramid (as pretty much all  hierarchies are), with just a few values at level 2, 3, and 4. However, by the time you get down to the next-to-last level (SKU), the values have ballooned to well over 100,000; and at the bottom level (UPC), the list of values are numbered in the millions. This is because every single item that Longs sold had a separate product identifier (UPC) which differentiates by package size (8 oz. vs. 12 oz. of Gelatin), flavor (grape or cherry), brand (Jello or Royal), and even type of packaging (single boxes versus six-packs/bundles). And more.</p>
<p>Given all, this, it is clear that the Metadata repositoryis NOT the system of record for any part of the product hierarchy (that also has not changed from my initial dialog). But does it make sense to record the products in the repository? I would now say that it does not.  Basically, the list will be out of date almost immediately, because the business processes to keep it updated doesn’t exist — and really doesn’t need to.</p>
<p>However, just to be clear, I do think that the definition (the levels, what they are called, what they mean) of product hierarchy ITSELF (and probably any other hieararchy) SHOULD be documented in the Metadata repository.  If there is no agreement on the hierarchy, then reports based on the disparate hierarchies will not match. In addition, it is a very bad idea to have different versions/definitions of what is meant by “product” (the bottom level of the hierarchy). If one group defines product as (for example) an auto insurance policy, while another breaks it down to a specific set of coverages in an auto policy, then not only will reports not work properly, but the very systems that enable working with those products will not work the same and will require people using the systems to know the idiosyncracies of something as basic as what the product is. While little can be done when working with legacy systems, new systems should be designed with a common, governed, list of products and a common hierarchy.</p>
<p>So, that’s it! As always, comments are much welcomed, especially from those of you who have fought this fight before.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/metadata-keeping-track-of-lists-of-values/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
