<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DATAVERSITY &#187; Karen Lopez</title>
	<atom:link href="http://www.dataversity.net/category/discussion/blogs/karen-lopez/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dataversity.net</link>
	<description></description>
	<lastBuildDate>Wed, 19 Jun 2013 18:44:31 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>7 Tips for Staying Relevant and Valued as a Data Modeler</title>
		<link>http://www.dataversity.net/7-tips-for-staying-relevant-and-valued-as-a-data-modeler/</link>
		<comments>http://www.dataversity.net/7-tips-for-staying-relevant-and-valued-as-a-data-modeler/#comments</comments>
		<pubDate>Thu, 24 Jan 2013 08:11:29 +0000</pubDate>
		<dc:creator>Karen Lopez</dc:creator>
				<category><![CDATA[Big Challenges in Data Modeling]]></category>
		<category><![CDATA[Conference and Webinar Communities]]></category>
		<category><![CDATA[Data Modeling]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Karen Lopez]]></category>
		<category><![CDATA[Trending Jobs in Data Management]]></category>
		<category><![CDATA[data modeling tool]]></category>
		<category><![CDATA[forward engineer]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[RDBMS]]></category>
		<category><![CDATA[reverse engineer]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=17253</guid>
		<description><![CDATA[by Karen Lopez 1. Learn about NoSQL database technologies You aren&#8217;t going to get any warning when your company&#8217;s first NoSQL project shows up.  It might be the result of a 19th-hole acquisition project (when C-level executives are wined and dined by a software vendor) or it might be part of a package solution that someone has purchased.  And it will need to be installed and running by tomorrow. I recommend you check out Hadoop-related data technologies, a graph database, a key-value pair database and a document-based database.  Attend user group meetings for NoSQL technologies.  Attend a whole conference.  Read.  Watch videos.  You&#8217;ll need to understand where NoSQL technologies fit within a modern data architecture. 2. Learn how to correctly reverse engineer application databases I have worked with more than a few data architects who have never reverse or forward engineered a database.  I&#8217;m not sure how they have survived all these years, but data architects need to have the skills to reverse engineer a database, even with package solutions.  Sure commercial software might be missing primary keys, foreign keys and all kinds of constraints and other database-hosted data quality features, but we need to know that, too.It&#8217;s not enough [...]]]></description>
				<content:encoded><![CDATA[<p>by <a title="Karen Lopez" href="http://www.dataversity.net/contributors/karen-lopez" target="_blank">Karen Lopez</a></p>
<h2><a href="http://www.dataversity.net/7-tips-for-staying-relevant-and-valued-as-a-data-modeler/seven-2/" rel="attachment wp-att-17258"><img class="alignleft" alt="Seven" src="http://www.dataversity.net/wp-content/uploads/2013/01/Seven1.png" width="198" height="195" /></a>1. Learn about NoSQL database technologies</h2>
<div>
<p>You aren&#8217;t going to get any warning when your company&#8217;s first NoSQL project shows up.  It might be the result of a 19th-hole acquisition project (when C-level executives are wined and dined by a software vendor) or it might be part of a package solution that someone has purchased.  And it will need to be installed and running by tomorrow.</p>
<p>I recommend you check out Hadoop-related data technologies, a graph database, a key-value pair database and a document-based database.  Attend user group meetings for NoSQL technologies.  Attend a whole conference.  Read.  Watch videos.  You&#8217;ll need to understand where NoSQL technologies fit within a modern data architecture.</p>
</div>
<h2>2. Learn how to correctly reverse engineer application databases</h2>
<div>I have worked with more than a few data architects who have never reverse or forward engineered a database.  I&#8217;m not sure how they have survived all these years, but data architects need to have the skills to reverse engineer a database, even with package solutions.  Sure commercial software might be missing primary keys, foreign keys and all kinds of constraints and other database-hosted data quality features, but we need to know that, too.It&#8217;s not enough to know how to click <em>NEXT</em>, <em>NEXT</em>, <em>NEXT</em> on your data modeling tool, either.  You&#8217;ll need to understand what those hundreds of options are in that wizard.</div>
<h2>3. Get experience data modeling for integration projects</h2>
<p>Not all data modeling results in a brand new database built from scratch.  One of the most common reasons I&#8217;ve been doing modeling is for building canonical models for integration projects. These are often implemented as XML schemas, but I start with a logical data model of requirements using very similar techniques as I do for a database design project.</p>
<h2>4. Learn a pattern or universal data model</h2>
<div>
<p>There&#8217;s no strong reason why data models about core data about people, organizations, products, categories, addresses, contact mechanisms should vary across models, especially at the same company.  Using pattern data models can significantly reduce the time it takes to produce a correct and complete model.  Sure, tailoring may be needed, but if you are familiar with common data modeling patterns, you can free up a huge amount of time to focus on those data requirements are proprietary and unique to your business.</p>
<p>Understanding a pattern model can take some time; that&#8217;s why you need to start reading and studying now.</p>
</div>
<h2>5. Learn more features of your data modeling tool</h2>
<p>Like many productivity tools, data modeling tools have evolved a great deal over the last few years.  We architects benefit from a significant number of mature features in our tools.  And like office productivity users, many of us use only 10-25% of those features.  Automation is a key part of many of the leading tools, yet some modelers are reluctant to learn new coding and scripting skills needed to master them.  You should.  The payoff for productivity can be significant after a short learning curve.</p>
<h2>6. Learn about the new features of your target DBMSs</h2>
<div>
<p>The new data types, identifier types, constraints, etc. for DBMSs is growing with every release.  As data modelers, we need to be able to understand the pros and cons for choosing those over more traditional features.  Take formal training if you have to.  Attend user group meetings.  Get your DBAs and developers to lead some brown bag lunches.  Read online tutorials.</p>
<p>Nothing says &#8220;out of touch&#8221; like having to ask what something means in a design when that feature has been around for five years.  It gets worse if you get caught specifying a deprecated approach to a design.</p>
</div>
<h2>7. Build a data modeling process that allows you to produce releases quickly</h2>
<div>Whether or not you will be following an Agile approach, modern development projects need to have faster and shorter iterations.  If it takes 3 days to publish and share your data models, that&#8217;s too long.  Modeling tool automation and documentation features should be your primary method for producing the artifacts related to data models.</div>
<h2>As little as 15 minutes a day can keep your skills sharp</h2>
<div>If these tips sound a bit more physical than you are used to, they are.  Yes, we still need to produce beautiful conceptual and logical data models, but our relevance is also being measured on how well we are able to provide business solutions.  More times than not, that means being able to effectively contribute to the designing and building of those solutions.Spending just 15 minutes a day working your way through this list will make a huge difference in ensuring your data relevance on future projects.  Your project teams will find you more valuable and be happier with your contributions.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/7-tips-for-staying-relevant-and-valued-as-a-data-modeler/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Frequently Asked Questions about Data Modeling, Part One</title>
		<link>http://www.dataversity.net/frequently-asked-questions-about-data-modeling-part-one/</link>
		<comments>http://www.dataversity.net/frequently-asked-questions-about-data-modeling-part-one/#comments</comments>
		<pubDate>Mon, 14 Jan 2013 08:10:14 +0000</pubDate>
		<dc:creator>Karen Lopez</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Governance and Quality]]></category>
		<category><![CDATA[Data Modeling]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>
		<category><![CDATA[Karen Lopez]]></category>
		<category><![CDATA[data modeling]]></category>
		<category><![CDATA[FAQ]]></category>
		<category><![CDATA[training]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=16824</guid>
		<description><![CDATA[by Karen Lopez @datachick I deemed last year The Year of Data.  Big Data, Data Analytics, Open Data, Data Breaches, etc. were topics of discussion everywhere, even outside the IT world.  Heck, data management even made it into the US State of the Union Address. I think all this added focus on data has given us a ripe opportunity to help our business organizations to leverage data modeling and other data management approaches even more than ever.  It has also led to my inbox and Twitter stream to be flooded with questions about data modeling.  I see this uptick in inquiries as a good indicator of job stability for data professionals.  At least I hope so.  If anything, I&#8217;m hoping it means that the pendulum has swung a bit back to center from &#8220;software is everything&#8221; mentalities. With all these questions about data model coming in, I thought I&#8217;d answer the most common ones here so that I can share them more widely. Why is having a data model so important? Ah, the big question. My reasons: Technologies used to move and persist data come in many forms, over time and at the same time. I support the creation of XML Documents, multiple [...]]]></description>
				<content:encoded><![CDATA[<p>by <a href="http://www.dataversity.net/contributors/karen-lopez">Karen Lopez</a> <em><a href="http://www.twitter.com/datachick">@datachick</a></em> <a href="http://www.dataversity.net/?attachment_id=16845" rel="attachment wp-att-16845"><img class=" wp-image-16845 alignleft" style="margin-left: 15px;margin-right: 15px" alt="Question Mark" src="http://www.dataversity.net/wp-content/uploads/2013/01/Question-Mark-241x300.jpeg" width="123" height="154" /></a></p>
<p>I deemed last year <em>The Year of Data</em>.  Big Data, Data Analytics, Open Data, Data Breaches, etc. were topics of discussion everywhere, even outside the IT world.  Heck, data management even made it into the <a href="http://www.whitehouse.gov/photos-and-video/video/2012/01/25/2012-state-union-address-enhanced-version" target="_blank">US State of the Union Address</a>. I think all this added focus on data has given us a ripe opportunity to help our business organizations to leverage data modeling and other data management approaches even more than ever.  It has also led to my inbox and Twitter stream to be flooded with questions about data modeling.  I see this uptick in inquiries as a good indicator of job stability for data professionals.  At least I hope so.  If anything, I&#8217;m hoping it means that the pendulum has swung a bit back to center from &#8220;software is everything&#8221; mentalities. With all these questions about data model coming in, I thought I&#8217;d answer the most common ones here so that I can share them more widely.</p>
<h2>Why is having a data model so important?</h2>
<p>Ah, the big question. My reasons:</p>
<ul>
<li>Technologies used to move and persist data come in many forms, over time and at the same time. I support the creation of XML Documents, multiple DBMSs, multiple versions of the same DBMS. By having a single logical data model of data requirements, I can <strong>separate the rules and definitions of the requirements from their implementation.</strong></li>
<li>The same data exists in many platforms, in many locations.  I need to be able to map those sources to targets across platforms and systems.  Doing this mapping, or data lineage, with a data model is much easier.  It also helps me <strong>understand the implications of making a change in one system on another system</strong>.  I can&#8217;t do that by just looking at code.</li>
<li><strong>Writing stuff down is often a good way to impress people</strong> that have to provide requirements, compliance issues, security requirements, etc. They like not having to answer the same question over and over for different people and roles.  I can&#8217;t tell you how many times a business person has asked me if IT ever writes anything down.  Imagine your frustration if you had someone working on your house and ever tradesperson started their day with &#8220;Tell me about what sort of house you want? How many people will live here? Do you want to be able go get to the second floor? Do you need a bathroom?  How about a shower?&#8221;  That&#8217;s how many business professionals see us in IT.</li>
<li>A data model is a great way of capturing rules, constraints, definitions in a method that is technology independent. I can capture those things <strong>once and reference them in many places.</strong></li>
<li>We can <strong>measure databases against our data models to assess fit</strong>. This can be done for application packages and custom development.</li>
<li><strong>Enterprise data is complex.</strong>  I work with a data model that has 32,000 objects (tables, columns, datatypes, constraints, etc.) in it.  There is no way I could professionally manage change by just trying to remember all this information.  Nor could anyone else.</li>
<li><strong>Modeling helps you ask the right questions</strong> before a bunch of time is spent coding, creating screens, reports, etc. This reduces costs and the number of bug fixes required.</li>
<li>Like all models, the<strong> data model is a communication tool</strong> and is good for tying requirements directly to designs and implementations.  This is especially true when I generate data prototypes based on the model.</li>
<li>Data governance can&#8217;t easily be done via reverse-engineered pictures of databases.</li>
<li>Once you&#8217;ve worked with a great data model, <strong>you can’t go back</strong>.</li>
</ul>
<h2>Is it worth spending money on consultants to teach us the tool and help us build a model, or should we expect a newly hired data architect to be able to do that for us?</h2>
<p>It depends…am I the consultant? If you hire the right kind of architect, he or she should be able to do that sort of training if you budget time for them to do that. Often when I’m hired as the sole architect, I’m swamped with real work and no one has time to be trained nor is there time allocated for me to train anyone.  It also depends on the consultants. If they specialize, really, in data modeling, it may be worth it. If they have casual experience, then maybe not. I want to warn you that hiring a data architect that really is one is difficult. Most interviewees will be DBAs or developers who have once seen a data model or always wanted to get into this line of work. I typically have to review 50 resumes to find a candidate that has real enterprise level experience. If you are going to have only one person filling the data architect role, they can&#8217;t be working at the apprentice level unless they have a mentor. Most good data architects actually aren&#8217;t good part time DBAs, either. You may be better off trying to find a combination BA/DA.  I wrote about this <a title="Hiring Data Professionals: Mason Dixon Lines and Zombies in Your Job Postings" href="http://www.dataversity.net/hiring-data-professionals-mason-dixon-lines-and-zombies-in-your-job-postings/" target="_blank">data architect hiring dilemma</a> previously.</p>
<h2>What can we do with a tool like [insert real data modeling tool here] that we can&#8217;t do other less expensive tools?</h2>
<p>What you can do with real data modeling tools is…<em>modeling</em>. The iterative process of doing logical models, having multiple physical models based on them, making changes to them (either because you are in development or because you have a new requirement) is how projects work in the real world.  Many of the less expensive tools allow you to reverse engineer a database and forward engineering a full database.  But here in the real world of development we make changes to designs and models all day long.  We need to estimate the cost of change, measure the impact of that change and make changes. I need to be able to generate changes to databases without a lot of hand scripting and I can&#8217;t drop the database and recreate it on a regular basis.  I have to do that while supporting multiple versions of the same DBMS or multiple types of DBMSs.   The less expensive ones either don’t’ do any of that, or they don’t easily support the iterative process of real life modeling and design. I also need to be able to produce reports, images and interactive versions of the data models. I also need to be able to share the models in formats that can be easily consumed by dozens of other modeling and development tools.   Most of the lesser tools don&#8217;t have these shiny features. Many of the real data modeling tools have features geared toward collaborating with other team members.  These features make sharing, updating, commenting on and getting hands-on with the data models easier and faster. It really comes down to finding a tool that helps you reduce costs of development, get that development done faster and getting better quality designs.  The value proposition for that is enormous compared to costs savings of having a tool that only makes nice pictures of one type of database.</p>
<h2>What is your question?</h2>
<p>Do you have a question about data modeling?  Leave it in the comments or contact me directly<a href="http://www.twitters.com/datachick" target="_blank"> @datachick</a> or via the Contact Us menu item above and I&#8217;ll blog a response here.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/frequently-asked-questions-about-data-modeling-part-one/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Size Doesn&#8217;t Matter.  Or Does It? A Rant on Big Data Terms</title>
		<link>http://www.dataversity.net/size-doesnt-matter-or-does-it-a-rant-on-big-data-terms/</link>
		<comments>http://www.dataversity.net/size-doesnt-matter-or-does-it-a-rant-on-big-data-terms/#comments</comments>
		<pubDate>Wed, 16 May 2012 07:01:51 +0000</pubDate>
		<dc:creator>Karen Lopez</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Conference and Webinar Communities]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Enterprise Data World]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Karen Lopez]]></category>
		<category><![CDATA[Most Popular in 2012]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Rant]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=11087</guid>
		<description><![CDATA[By Karen Lopez @datachick Last week at Enterprise Data World, I gave a lightning talk (a strictly-enforced time-limited five minute presentation).  Since I can barely manage to fit meaningful presentations in one hour given how much I talk, I decided to go for a rant.  That may not come as a surprise to you.  If it does, welcome new reader. Last year I also gave a lightning talk on the Myths of Normalization, which I&#8217;ve turned into a blog series here, too.  That was also a rant.  I&#8217;m seeing a pattern here.  You should probably envision me holding a martini and my tablet while I ranted.  It will put you in the mood. I didn&#8217;t actually hold a martini in my hand.  It was on the table beside me. Since I got good feedback, I decided to share my script for my talk here. By the way, those of you snickering about this rant, know that I&#8217;m working on a similar one for the RDBMS area as we speak.  Look for it at a NoSQL or Big Data event near you. Size Doesn&#8217;t Matter.  Or Does It? I&#8217;m @Datachick. I think a lot about data.  Today I&#8217;m on a rant.  I [...]]]></description>
				<content:encoded><![CDATA[<p>By <a href="http://www.dataversity.net/contributors/karen-lopez">Karen Lopez</a></p>
<p><a href="http://www.twitter.com/datachick">@datachick</a></p>
<p><a href="http://www.dataversity.net/wp-content/uploads/2012/05/rant_this_way.jpg"><img class="size-medium wp-image-11132 alignleft" style="margin-right: 15px;" src="http://www.dataversity.net/wp-content/uploads/2012/05/rant_this_way-300x198.jpg" alt="Rant sign" width="300" height="198" /></a>Last week at Enterprise Data World, I gave a lightning talk (a strictly-enforced time-limited five minute presentation).  Since I can barely manage to fit meaningful presentations in one hour given how much I talk, I decided to go for a rant.  That may not come as a surprise to you.  If it does, welcome new reader.</p>
<p>Last year I also gave a lightning talk on the <a title="Myth 1 – Normalization: Friend or Foe The Slogan" href="http://www.dataversity.net/myth-1-normalization-friend-or-foethe-slogan/">Myths of Normalization</a>, which I&#8217;ve turned into a blog series here, too.  That was also a rant.  I&#8217;m seeing a pattern here.  You should probably envision me holding a martini and my tablet while I ranted.  It will put you in the mood. I didn&#8217;t actually hold a martini in my hand.  It was on the table beside me.</p>
<p>Since I got good feedback, I decided to share my script for my talk here. By the way, those of you snickering about this rant, know that I&#8217;m working on a similar one for the RDBMS area as we speak.  Look for it at a NoSQL or Big Data event near you.</p>
<h2>Size Doesn&#8217;t Matter.  Or Does It?</h2>
<p>I&#8217;m @Datachick. I think a lot about data.  Today I&#8217;m on a rant.  I know.  SHOCKING!!!</p>
<p>I&#8217;m a huge fan of Big Data and NoSQL. Really.  A really, really big fan.  Get it?  BIG DATA.  Today I want to share with you some of my more snarky observations about BIG DATA.  By the way, <strong>every single one of these rants is totally unfair, cherry picked and irreverent.</strong>  I know. It&#8217;s shocking.</p>
<div>
<p>Let&#8217;s start with the basics: What is Big Data?  I&#8217;m here to tell you that nobody really knows.  The good thing about Big Data is just that.  So it can be anything you want it to be. Really.  Just like that nice friendly woman who wanted you to buy her a drink last night.</p>
<p>Here&#8217;s a nice definition from Wikipedia, the ultimate source of knowledge for the human race.  But that&#8217;s a whole &#8216;nother rant.</p>
<blockquote><p>In information technology, big data consists of data sets that grow so large that they become awkward to work with(1)</p></blockquote>
<p>What the heck kinda definition is that?  Data that&#8217;s so big it&#8217;s awkard?  I can&#8217;t wait to be in that meeting with CEO, CIO and friends.</p>
<h2>Big Data vs. data</h2>
<p>One of the things I noticed about Big Data is that it is always capitallized when it&#8217;s written.  I&#8217;m not sure why, because it really isn&#8217;t a proper noun.  We don&#8217;t capitalize DATA, so why should Big Data be?  I&#8217;m pretty sure that capitalizaiton is a way to spot the birth of a silver bullet.  Remember that when the next big thing, HUGE DATA, then GINORMOUS DATA is announced at next year&#8217;s EDW. I guess then big data will lose its title caps.</p>
<h2>Hadoop</h2>
<p>Hadoop is one of the many technologies that has come from the Big Data religion&#8230;er&#8230;movement&#8230;no&#8230;solutions.  The great thing about Hadoop is that everything that makes up Hadoop is named Hadoop.  Really.  You can&#8217;t make this stuff up.</p>
<ul>
<li><a href="http://hadoop.apache.org/common/">http://hadoop.apache.org/common</a> Hadoop Common: The common utilities that support the other Hadoop subprojects.</li>
<li><a href="http://hadoop.apache.org/hdfs/">http://hadoop.apache.org/hdfs</a> Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.</li>
<li><a href="http://hadoop.apache.org/mapreduce/">http://hadoop.apache.org/mapreduce</a> Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters.(2)</li>
</ul>
<p>What do you think the mascot is named?  Yep. You got it.  Not Harvey or Harry, but Hadoop.  Isn&#8217;t it just like the new crowd to not worry about giving everything its own distinctive name?  Eventually everything becomes consistent.</p>
<div>
<p>Okay, not everything. Other Hadoop-related projects at Apache include:</p>
<ul>
<li><a href="http://avro.apache.org/">http://avro.apache.org</a> Avro™: A data serialization system.</li>
<li><a href="http://cassandra.apache.org/">http://cassandra.apache.org</a> Cassandra™: A scalable multi-master database with no single points of failure.</li>
<li><a href="http://incubator.apache.org/chukwa/">http://incubator.apache.org/chukwa</a> Chukwa™: A data collection system for managing large distributed systems.</li>
<li><a href="http://hbase.apache.org/">http://hbase.apache.org</a> HBase™: A scalable, distributed database that supports structured data storage for large tables.</li>
<li><a href="http://hive.apache.org/">http://hive.apache.org</a> Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying.</li>
<li><a href="http://mahout.apache.org/">http://mahout.apache.org</a> Mahout™: A Scalable machine learning and data mining library.  Elephant Driver</li>
<li><a href="http://pig.apache.org/">http://pig.apache.org</a> Pig™: A high-level data-flow language and execution framework for parallel computation.</li>
<li><a href="http://zookeeper.apache.org/">http://zookeeper.apache.org</a> ZooKeeper™: A high-performance coordination service for distributed applications.</li>
</ul>
<p>Remember when technologies had names we could say in front of business people without making them think we were idiots? vCOBOL. BASIC. SQLServer.  Try saying your project is late because your elephant driver needs to be tuned to work with your pig and ZooKeeper.  I want to watch.</p>
<h2>Schemaless</h2>
<p>One of the great things about Big Data is that usually we don&#8217;t know ahead of time what data we are going to get or what answers we need to answer.  Yes, Big Data often means a design that is just a big heap of THINGS related to THINGS.  Makes data modeling easy.  Sort of.  Not really. See, the problem with this is that the schema or the design is embedded in with the &#8220;real&#8221; data.  So you can add new data in an instant. Often just as the data arrives.   Get ready to sprint your data designs. And by sprint, I mean model at the speed of light.  I hope you are in training now.  Also, be prepared to see autocorrect change schemaless to many different words that the one you meant. Go ahead.  But try it at home, not at work.</p>
<h2>Eventual Consistency</h2>
<div>
<p>I used this term previously.  It&#8217;s okay, I&#8217;m being consistent.  Unlike most Big Data technologies.  See, there&#8217;s this concept of Eventual Consistency that says that data has controlled duplication across nodes.  Well, sorta controlled.  See in the Big Data world, it&#8217;s okay that the results of the query you run can produce different values than when I run it  Eventally at some point we will get this same result.  Just like a broken clock is right twice a day.  Except in Europe.</p>
<p>I don&#8217;t know about you, but I want to know that my version of my bank account balance is the same one that the bank is using to process my cheques.</p>
<p>And don’t even get me started on the people who say that &#8220;Eventually the customer will call and ask us to correct the data if it is important to him.&#8221;  Seriously?  What world does this guy live in? Talk about living in the clouds.</p>
<p>All I can say is Consistent my ASCII.</p>
<h2>Finally…</h2>
<p>I&#8217;ve been snarky here.  But there really isn&#8217;t a reason to think that Big Data, NoSQL and the likes are competitors of traditional database technologies.  We need to be using the right tools for the right job.  Size Doesn&#8217;t Matter.</p>
<p>Schemaless is perfect for designing data solutions where you don&#8217;t know or really care about perfect data integrity.  Think about getting a data feed from an external source where you have no control of what they send you. That flexibility works.</p>
<p>Eventual consistency is just fine for many applications. Who really cares whether everyone sees your Facebook update at the same time? Or whether your iTunes receipt shows up hours later? They don&#8217;t do exchanges or refunds anyway.</p>
<p>I suggest you read up on Big Data, attend some talks like the ones here, find out what applications are using Hadoop and other non-relational technologies.  They will need data experts and you want to be ready when they need us.</p>
<p>And they involve data.  <strong>Love your data</strong> by using the right solutions.</p>
<p>(1) Wikipedia contributors. &#8220;Big data.&#8221; <em>Wikipedia, The Free Encyclopedia</em>. Wikipedia, The Free Encyclopedia, 1 May. 2012. Web. 1 May. 2012.</p>
<p>(2)  &#8221;Welcome to Apache Hadoop&#8221;, hadoop.apache.org, 1 May. 2012,  Web. 1 May. 2012</p>
</div>
</div>
</div>

						<div id="pdrp_endAttribution">
						photo by: 
						 
							<a href="http://flickr.com/80682954@N00/2216511038" target="_blank" class="pdrp_link pdrp_attributionLink">
								Nesster</a>
						</div>
					]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/size-doesnt-matter-or-does-it-a-rant-on-big-data-terms/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Outliers, Charts and Data Visualizations</title>
		<link>http://www.dataversity.net/outliers-charts-and-data-visualizations/</link>
		<comments>http://www.dataversity.net/outliers-charts-and-data-visualizations/#comments</comments>
		<pubDate>Wed, 22 Feb 2012 08:01:41 +0000</pubDate>
		<dc:creator>Karen Lopez</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Business Intelligence]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Karen Lopez]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[Chart]]></category>
		<category><![CDATA[data visualization]]></category>
		<category><![CDATA[infographic]]></category>
		<category><![CDATA[Outlier]]></category>
		<category><![CDATA[Twitter]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=9152</guid>
		<description><![CDATA[by Karen Lopez @datachick I posted that Tweet last week while attending the NASA 2013 Fiscal Budget Briefing at NASA Headquarters. It did well, being retweeted 200+ times and had the prospect of reaching 1.9 million people. I say prospect because not everyone who follows someone reads all their tweets. But 1.9 million isn&#8217;t such a bad reach when you have a message to get out. Most of this reach was due to various NASA-related accounts retweeting it, but it was helped by regular Twitter users doing their normal thing on Twitter: sharing information with their followers. One of the trade offs of having such a huge outlier in my data is that the charts on my Twitter data analytics are nearly useless for all the other thousands of tweets I did last week (yes, I Tweet&#8230;a lot.) That red circle in the upper right represents the number of replies (the size of the circle) and the number of retweets and impressions (the X and Y axis). Looks good, until you see the blob of blue in the lower left. The fact that this outlier in my data was so far out there makes the other pieces of data look almost [...]]]></description>
				<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://www.dataversity.net/wp-content/uploads/2012/02/NASABudgeTweett1.png"><img class="size-full wp-image-9160" title="NASABudgetTweet Each $ Spent on Space Exploration is spent here on Earth" src="http://www.dataversity.net/wp-content/uploads/2012/02/NASABudgeTweett1.png" alt="Each $ Spent on Space Exploration is spent here on Earth" width="536" height="157" /></a></p>
<p style="text-align: left;">by <a href="http://www.dataversity.net/contributors/karen-lopez">Karen Lopez</a> <em><a href="http://www.twitter.com/datachick">@datachick</a></em></p>
<p style="text-align: left;">I posted that Tweet last week while <a href="http://blog.infoadvisors.com/index.php/2012/02/21/a-new-era-for-nasatweetup-the-nasa-fiscal-year-2013-budget-briefing/">attending the NASA 2013 Fiscal Budget Briefing at NASA Headquarters</a>. It did well, being retweeted 200+ times and had the prospect of reaching 1.9 million people. I say prospect because not everyone who follows someone reads all their tweets. But 1.9 million isn&#8217;t such a bad reach when you have a message to get out. Most of this reach was due to various NASA-related accounts retweeting it, but it was helped by regular Twitter users doing their normal thing on Twitter: sharing information with their followers.</p>
<p>One of the trade offs of having such a huge <a href="http://en.wikipedia.org/wiki/Outlier" target="_blank">outlier </a>in my data is that the charts on my Twitter data analytics are nearly useless for all the other thousands of tweets I did last week (yes, I Tweet&#8230;a lot.)</p>
<p style="text-align: center;"><a href="http://www.dataversity.net/wp-content/uploads/2012/02/OneWeekTweetChart1.png"><img class="aligncenter  wp-image-9157" src="http://www.dataversity.net/wp-content/uploads/2012/02/OneWeekTweetChart1.png" alt="" width="550" height="408" /></a></p>
<p>That red circle in the upper right represents the number of replies (the size of the circle) and the number of retweets and impressions (the X and Y axis). Looks good, until you see the blob of blue in the lower left. The fact that this outlier in my data was so far out there makes the other pieces of data look almost zero on both axes . I think I do pretty well with my social media outreach, but this chart would so <a href="http://en.wikipedia.org/wiki/Fuddle_duddle" target="_blank">fuddle duddle the data</a> that it hides important information about my &#8220;normal&#8221; performance on Twitter. In fact, it almost makes it look like all my Tweets perform equally as well&#8230;or poorly.</p>
<p>So what could you do to make this chart more meaningful:</p>
<ul>
<li>Remove the outlier from the chart and or the data, completely</li>
<li>Create two charts, one with and one without the outlier</li>
<li>Make the graph 1000 times taller and wider</li>
<li><a href="http://technet.microsoft.com/en-us/library/dd220529(v=sql.110).aspx" target="_blank">&#8220;Break&#8221; the Y Axis</a> so that there&#8217;s a gap between 100k and 1.8 million</li>
<li>Use other techniques such as a <a href="http://en.wikipedia.org/wiki/Logarithmic_scale" target="_blank">logarithmic scale</a> to show the data ratios instead of quantities</li>
<li>Use statistical methods to massage the data even more</li>
<li>Make better data (in this case, send Tweets that fill the gap to make my outlier look more normal)</li>
</ul>
<p>I could also try to include a longer time period, such as including all my Tweets, not just the ones from this past week.</p>
<p style="text-align: center;"><a href="http://www.dataversity.net/wp-content/uploads/2012/02/AllTweetsChart1.png"><img class="aligncenter  wp-image-9156" src="http://www.dataversity.net/wp-content/uploads/2012/02/AllTweetsChart1.png" alt="" width="550" height="370" /></a></p>
<p>So a few more Tweets that had more retweets, but the impressions still look almost zero. So that doesn&#8217;t really help show how the rest of my Tweets did.</p>
<p>In business data, I&#8217;ve seen people opt to remove the outlier in additional charts, but sometimes they mask or delete them with no indication that they&#8217;ve been removed. Sure, my almost 2 million impression Tweet is messing with the display of other data, but if my performance bonus was based on that sort of thing, I wouldn&#8217;t want the data to vanish like the <a href="http://www.nasa.gov/pdf/622643main_FY%2013%20Budget%20Presentation.pdf" target="_blank">$38 million that was cut from the NASA STEM outreach budget.</a> Your data needs may be different, though. So it&#8217;s important to find out how business users want outlier date dealt with. The reference links below talk about other more advanced methods for dealing with outliers. All I know is that my &#8220;all&#8221; chart isn&#8217;t going to help me much as long as that outlier is in the data set.</p>
<p>I&#8217;d recommend that whatever technique you use, you ensure that the reader understands what has been done. Remember that the goal of all charts should be to reveal more about the data than just looking at the raw data. If your <a href="http://blog.infoadvisors.com/index.php/2011/12/22/stupidest-bar-chart-of-2011-congrats-klout/" target="_blank">chart doesn&#8217;t do that</a> (Congrats again, Klout), maybe you need to rethink how you are presenting the data.</p>
<h6>Other References:</h6>
<ul>
<li><a href="http://data-literacy.com/2012/01/30/data006-outliers-can-make-or-break-you/">Data006: Outliers can make or break you.</a> (data-literacy.com)</li>
<li><a href="http://www.schneier.com/blog/archives/2012/02/liars_and_outli_4.html">Liars and Outliers Update</a> (schneier.com)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/outliers-charts-and-data-visualizations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>11 Things I Should Have Done in 2011</title>
		<link>http://www.dataversity.net/11-things-i-should-have-done-in-2011/</link>
		<comments>http://www.dataversity.net/11-things-i-should-have-done-in-2011/#comments</comments>
		<pubDate>Wed, 04 Jan 2012 08:01:36 +0000</pubDate>
		<dc:creator>Karen Lopez</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Karen Lopez]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Semantic Technology]]></category>
		<category><![CDATA[2012]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[books]]></category>
		<category><![CDATA[RDBMS]]></category>
		<category><![CDATA[Resolutions]]></category>
		<category><![CDATA[semantic technology]]></category>
		<category><![CDATA[Space]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=7756</guid>
		<description><![CDATA[by Karen Lopez Rather than write about my accomplishments over the last year, I&#8217;m going to write about all the things I should have been working on, but didn&#8217;t. I have lots of excuses: not enough hours, not enough days or too many other things to take care of. These are the types of things that Stephen Covey wrote about in First Things First under the theme of &#8220;Big Rocks.&#8220;  Basically, if I&#8217;d spent just a few minutes every day adding these to my schedule, I would have accomplished more. If I had just added them to my life, I&#8217;d be in a much better position sitting here at the end of 2011. I might even be richer, taller and thinner. Maybe even smarter. 1. Read more about Data I had all these great intentions to read more about Big Data, Data Quality, Data Modeling, Data Visualizations, Data Science&#8230;well, you get the idea. I have a pile of such books next to my desk and filling my iPad. I also have subscriptions to digital libraries to many of them, too. So it wasn&#8217;t a lack of access that stopped me. I also should have been more caught up on reading [...]]]></description>
				<content:encoded><![CDATA[<p>by <a href="../contributors/karen-lopez" target="_blank">Karen Lopez</a></p>
<p>Rather than write about my accomplishments over the last year, I&#8217;m going to write about all the things I should have been working on, but didn&#8217;t. I have lots of excuses: not enough hours, not enough days or too many other things to take care of.</p>
<p>These are the types of things that Stephen Covey wrote about in First Things First under the theme of &#8220;<a title="Stephen Covey on Big Rocks" href="http://www.youtube.com/watch?v=-VDxKLSyksI" target="_blank">Big Rocks.</a>&#8220;  Basically, if I&#8217;d spent just a few minutes every day adding these to my schedule, I would have accomplished more.</p>
<p>If I had just added them to my life, I&#8217;d be in a much better position sitting here at the end of 2011. I might even be richer, taller and thinner. Maybe even smarter.</p>
<h2>1. Read more about Data</h2>
<p>I had all these great intentions to read more about Big Data, Data Quality, Data Modeling, Data Visualizations, Data Science&#8230;well, you get the idea. I have a pile of such books next to my desk and filling my iPad. I also have subscriptions to digital libraries to many of them, too. So it wasn&#8217;t a lack of access that stopped me.</p>
<p>I also should have been more caught up on reading my RSS feeds of other data bloggers. What&#8217;s that you say? There aren&#8217;t really that many data bloggers? I agree. Why aren&#8217;t you blogging so that I can get better at catching up on reading your thoughts about data, life and the world?</p>
<h2>2. Learned more French and Hindi</h2>
<p>I have lots of resources right at my fingertips for learning French and not because I live in Canada. Here in Toronto French is way down the list of languages spoken by the general population. In fact, it&#8217;s 11th on the list after English, other European and several Asian languages.</p>
<p>Since almost all of the projects I&#8217;ve worked on in the last few years involved offshore teams, I&#8217;ve also been studying Hindi &#8212; not because I have to, but because it has been helpful. Did you know that Hindi uses the same word, <em>kal</em>, for <em>yesterday</em> and <em>tomorrow</em>? That explains a lot about some status updates that didn&#8217;t really pan out. I recommend you work this concept into your next status meeting.</p>
<p>I&#8217;d also like to learn some Mandarin. I&#8217;ve read in several places that if one learns Hindi and Mandarin (and sometimes French and Hindi), one can speak with half the world&#8217;s population.</p>
<p>Learning about languages helps me in internationalizing data designs. I know now that there are places in the world where people don&#8217;t have middle names, half the population has the same last name or where people only have one name. That&#8217;s just the tip of the iceberg. Studying other languages and cultures makes my data models better.  While we are on this topic, <a title="The Perfect Data Model Gone to Hell" href="http://blog.infoadvisors.com/index.php/2010/12/29/the-perfect-data-model-gone-to-hell-mi-due-to-bad-web-form-design/" target="_blank">stop making <em>Postal Code</em> all numeric in your designs</a>.  The rest of the world will thank you.</p>
<p>I should have worked on a Mayan language, given that they are going to <a title="2012 Phenomenon" href="http://en.wikipedia.org/wiki/2012_phenomenon" target="_blank">bring about the end of the world in late 2012</a>.</p>
<h2>3. Run more</h2>
<p><img class="alignleft" style="margin-right: 10px;" src="http://blog.infoadvisors.com/wordpress/wp-content/uploads/2011/10/P10303571.jpg" alt="Portland Marathon #SQLRun" width="200" height="152" align="left" />I&#8217;ve been an on-and-off runner since embarrassing my middle school track team as an 880 distance runner. Mostly off, I&#8217;d say. But a few years ago I decided that I was old enough to not care about how fast I ran and signed up for a few races. This year I ran my second and third half marathons. One of them was with a group of other data professionals, even. My goals for both of them were to just finish upright and smiling. I did, in both races. But I want to start having some time goals and that means training differently. It also means running more and cross training more.  Just like with IT training, I have to train for speed differently than training for completion.</p>
<p>Running has been a good stress-reliever and I certain know that the life of an architect in the fast-paced, agile/SCRUM/anything goes world of what they call system development methods these days leads to a lot of stress. Running is also one of those things that I can do anywhere, even while traveling. I just need a way to fit those giant running shoes in my carry-on bag and I&#8217;m set.</p>
<p>I love the gadgets I use when I run: my Garmin GPS watch, my Runmeter app that reads Tweets people are sending me while I run and my fitbit that collects and stores data bout how many steps I took in a day, how far I traveled and how many calories I burned. This helps me measure and monitor my progress and predict future performance. Larry English and his stopwatch would be proud, I hope.</p>
<h2>4. Develop a Million Dollar App</h2>
<p>It seems that everyone is doing this these days. <a title="Teenage App Developers" href="http://www.fastcompany.com/1621539/teen-iphone-app-developers" target="_blank">Even teenagers</a>. If I could just find a way to build an app that ties together data and space together in a way that everyone would want to hand over their hard-earned cash for, then I&#8217;d be set. Or I could just write one that lets cats play a game on my iPad. One of those would work and I&#8217;m afraid it&#8217;s only the cat one.</p>
<h2>5. Read the Scriptures</h2>
<p>I could probably do with reading all kinds of good books, but here I mean the writings of Codd, Chen, Date and Zachman. I&#8217;ve read all these in the past, but I think I may have read too much vendor documentation between then and now. Terminology has been twisted and refactored so much in practice that it pays to go back and read the original theory from time to time. Hear that Microsoft with your &#8220;Entity Framework&#8221; and Access &#8220;database&#8221;?</p>
<p>Most of these good works are available at the <a title="ACM Library" href="http://dl.acm.org/" target="_blank">ACM Library </a>and at Amazon. It&#8217;s a much better value to join ACM and sign up for the library subscription that it is to buy them one off at an online bookstore. Also, check out access via corporate subscription services, too.</p>
<h2>6. Perfect cloning techniques</h2>
<p>Even though I attended many events in 2011 (Enterprise Data World, SQLSaturdays, SQLRally, DAMA Days, PASS Summit, NIEM National Training Event and more), I still missed many of the key data-related conferences like the <a title="Semantic Technologies Conference" href="http://semtechbizsf2012.semanticweb.com/" target="_blank">Semantic Technologies conference</a> that would have exposed me to emerging technologies and best practices. If I could just clone myself I could have been a the many overlapping events that happened in 2011. I think that might have allowed me to complete more of the other items on the list.</p>
<h2>7. Install more DBMSs and Technologies</h2>
<p>I should have taken the time to play with research the newest versions of SQL Server, DB2, Oracle, MySQL, Postgress, Hadoop, MongoDB, SQL Server Azure and ….well all the 10 million other databases/non-databases out there. Sure, I can&#8217;t know them all, but getting hands-on with new tools and features is the best way to understand the Next Big Thing we&#8217;ll have to design for. That&#8217;s hands-on, working with real, non-trivial problem sets. These technologies are coming to a project near you soon, if they haven&#8217;t already. You don&#8217;t won&#8217;t to be the only one in the room, especially as the data professional, saying &#8220;Hadoop? Is that a character from Dr. Seus?&#8221;</p>
<h2>8. <del>Stalked Astronauts</del> Bought Real Estate in Cocoa Beach</h2>
<p><img src="https://fbcdn-sphotos-a.akamaihd.net/hphotos-ak-ash4/310220_10150430104749260_545549259_8454742_1944343423_n.jpg" alt="@Astro_Luca, @datachick and @VenusBarbie" width="200" height="140" align="right" />I actually did do some astronaut stalking this past year. I spent a lot of time in Cocoa Beach, Florida this year, not to mention Cologne, Germany. All these visits were about watching launches of humans and rockets into space&#8230;<a title="Astronauts and Space Agencies" href="https://www.facebook.com/media/set/?set=a.10150328702624260.366001.545549259&amp;type=3" target="_blank">and meeting smart people collecting and using data from beyond Earth&#8217;s boundaries</a>.  I think over the year I was fortunate enough to meet more than 25 former and active astronauts, not to mention key members of <a title="NASA Tweetup Photos" href="https://www.facebook.com/media/set/?set=a.10150204655414260.329751.545549259&amp;type=3" target="_blank">NASA</a>, <a title="CMDR Hadfield" href="https://www.facebook.com/media/set/?set=a.10150347200574260.369891.545549259&amp;type=3" target="_blank">CSA</a>, and ESA teams. I even attended, virtually, the <a title="NASA IT Summit" href="http://www.nasa.gov/offices/ocio/itsummit/" target="_blank">NASA IT conference</a>.</p>
<p>Did you know that <a title="Data NASA.gov" href="http://data.nasa.gov" target="_blank">data.nasa.gov</a> offers more than 1200 data sets for you to use? One of my favorites it the Great Images in NASA (GRIN) data set. Perfect for playing with those new database features in item 5 above.</p>
<p>We are still sending people into space, just not from Florida right now.  Heck, NASA sent 3 more astronauts to the International Space Station just last week.  So while the space coast real estate market is good for buyers right now, there are still unmanned missions launching from there in 2012.  All for collecting massive amounts of data about our world.</p>
<h2> 9. Interacted with machines more often, so they know me better when they take over</h2>
<p>Sure, I&#8217;m known for my love of gadgets. One of my favorites is my Xbox with Kinect. I primarily use it for sports and exercise . And yes, it collects and stores data about my workouts and play.  What I&#8217;m most excited about Kinect, though, is the expansion of its controller-less interfaces to other applications.</p>
<p>Imagine zooming in or out or double-clicking to drill down into the metadata of a table without having to use anything but your own hands in front of a projected display of your model. Or laying out your data model by just standing in front of your model and moving the entities with your hands. That&#8217;s what I want. When we later find out that Kinect forms a union at Cyberdyne Systems and becomes <a title="Skynet" href="http://en.wikipedia.org/wiki/Skynet_%28Terminator%29" target="_blank">Skynet</a>, I&#8217;ll be ready.</p>
<h2>10. Loved My Data More</h2>
<p>I think I did a decent job loving the data I was supposed to, but I could have done more. I could have tested more restores of databases (because you don&#8217;t really need backups; what you need are restores). I could have done more data profiling to see what data was lurking inside those columns with a name like <em>description</em> or <em>notes</em>. I could have tested my designs more, to ensure that they performed as well as I thought they would and that they could store the data they should have.</p>
<h2>11. Kicked my procrastination habit</h2>
<p>I&#8217;m going to start on this one next. I promise.</p>
<h2>Love Your Data</h2>
<p>What do all these things have in common? <span style="color: #993300;"><strong>DATA</strong>.</span></p>
<p>Let&#8217;s make sure that when we are together again, at the end of 2012 (assuming the Mayans were wrong) that we don&#8217;t have a similar long list of regrets. Love Your Data.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/11-things-i-should-have-done-in-2011/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Fly the Dang Plane</title>
		<link>http://www.dataversity.net/fly-the-dang-plane/</link>
		<comments>http://www.dataversity.net/fly-the-dang-plane/#comments</comments>
		<pubDate>Mon, 14 Nov 2011 08:10:53 +0000</pubDate>
		<dc:creator>Karen Lopez</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Enterprise Information Management]]></category>
		<category><![CDATA[Karen Lopez]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[focus]]></category>
		<category><![CDATA[goals]]></category>
		<category><![CDATA[tasks]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=6934</guid>
		<description><![CDATA[by Karen Lopez Last week I flew from Washington, DC to Toronto via Philadelphia. This was at the end of about 3 weeks of many flights, having done a DAMA speaking tour, a class and two SQLSaturdays. I&#8217;ve been known to Tweet my travel experiences along the way, but usually these are first world problems like not getting an upgrade, weird non-TSA baggage inspections or complaining about the terribly rude service I seem to experience on a regular basis. But this time my experience in Philly was different. My flight from Philly was delayed. It seems all my flights from Philly are delayed, as if there is a big time suck in that general area. I knew about this delay because I use TripIt to track my itineraries and TriptPro notified me that my flight had been delayed by about 40 minutes. But as usual for many airlines, the boards in the airport and the official websites showed the original time &#8212; no delay. I have been told that airlines do this so that you will get to the gate anyway and therefore give the airline a few more seconds to slice off their abysmal on-time performance stats&#8230;by having you [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.dataversity.net/wp-content/uploads/2011/11/FlameBucket1.jpg"><img class="alignleft size-medium wp-image-6961" style="margin-left: 10px;margin-right: 10px" src="http://www.dataversity.net/wp-content/uploads/2011/11/FlameBucket1-300x170.jpg" alt="Flame Bucket Switch" width="300" height="170" /></a>by <a href="../contributors/karen-lopez" target="_blank">Karen Lopez</a></p>
<p>Last week I flew from Washington, DC to Toronto via Philadelphia. This was at the end of about 3 weeks of many flights, having done a <a href="http://www.dama.org" target="_blank">DAMA </a>speaking tour, a class and two <a href="http://www.sqlsaturday.com" target="_blank">SQLSaturday</a>s. I&#8217;ve been known to Tweet my travel experiences along the way, but usually these are first world problems like not getting an upgrade, weird non-TSA baggage inspections or complaining about the terribly rude service I seem to experience on a regular basis. But this time my experience in Philly was different. My flight from Philly was delayed. It seems all my flights from Philly are delayed, as if there is a big time suck in that general area. I knew about this delay because I use TripIt to track my itineraries and TriptPro notified me that my flight had been delayed by about 40 minutes. But as usual for many airlines, the boards in the airport and the official websites showed the original time &#8212; no delay. I have been told that airlines do this so that you will get to the gate anyway and therefore give the airline a few more seconds to slice off their abysmal on-time performance stats&#8230;by having you wait in a crowded, no-place-to-sit-not-even-on-the-floor gate area.</p>
<h2>Alternative Data Sources</h2>
<p>Because USAir had chosen not to delay the flight on the boards, I made my way to the gate area near the original boarding time of 12:45. There was no plane at the gate. This is your number one sign that the flight is delayed. A missing plane can&#8217;t be deboarded, it can&#8217;t be cleaned and it can&#8217;t be boarded. So I found a spot to sit on the floor underneath a pay phone next to the door to the jet way. As soon as I sat down, a harried gate agent called off about 20 names to come to the podium. Normally this might mean upgrades, but in this case I was sure it was the sometimes required, sometimes not &#8220;document check&#8221;. Since this flight was headed to Canada, airlines sometimes want to see your proof of citizenship again before you get on the plane. This document check is done because airlines are required to return you at their cost if you aren&#8217;t able to enter another country. So I showed my passport as the gate agent scolded me for waiting until boarding time to show up to the gate. She was frazzled and irritated because so many other passengers still hadn&#8217;t had their documents checked. I went back to my seat on the floor.</p>
<p>Sitting next to the door were three women who had just been scolded by the gate agent for asking if the flight was delayed. It seemed that 80% of the passengers who approached the desk were asking the same question. Before we had mobile data devices we passengers were in the dark about flights, but now we have access to third-party data sets that can tell us when flight are delayed or cancelled. The airlines haven&#8217;t quite changed their data sharing processes to acknowledge that. They still assume that we have no other sources of data for flight information. Gate agents everywhere are distracted by passengers asking why the board data and their data services data is in conflict.  We should fix that.</p>
<p>But I digress. Data issues do that to me.</p>
<h2>Distraction and Flying the Plane</h2>
<p>As the three women and I were discussing the fact that it was 1:04 and our plane had not yet arrived for our 1:18 take off, a woman who had no badge or uniform walked passed us and into the propped open door of the jet way. Our friendly gate agent was busy reviewing documents, then using the PA to call for missing passengers to show their documents. She was consumed by two tasks: checking documents and convincing passengers that the flight was on time. Finally one of the three women (I&#8217;ll just call her Woman One) approached the desk and pointed out to the gate agent that someone who appeared to have no credentials or uniform had boarded the jetway without clearing with the gate agent. The agent paused, then said that she knew what she was doing and no one had entered the jetway. We stared in disbelief. Woman One said again that she had seen someone board without clearance.</p>
<p>Remember that the Department of Homeland Security has a <a href="http://www.dhs.gov/files/reportincidents/see-something-say-something.shtm" target="_blank">See Something, Say Something </a>campaign. Airports are covered with signage repeating this message. So this group of women had seen something and said something. Now they had reported it to an airline employee. What was the gate agent&#8217;s response? She repeatedly told Woman One to sit down an shut up. Woman One kept up with her reports, but the gate agent was focused on two things: getting those dang documents checked and ensuring that we all stayed at the gate area <strong>so that the flight could be boarded as fast as possible</strong>.</p>
<p>Do you see what&#8217;s happened here? The gate agent was so focused on these two tasks that she missed the whole point of her job: to be the agent that ensured only the right people got on that jetway. Sure, the document check is part of that, but she was presented with a much bigger threat than expired documents and she discarded that report in favor of her heads-down task of matching names on a document to names on a list. The airline industry has a name for this: <strong>forgetting to fly the plane</strong>. My friend Mike Walsh (<a href="http://www.twitter.com/Mike_Walsh" target="_blank">@mike_walsh </a>) has blogged and presented about this several times. In <a href="http://www.straightpathsql.com/archives/2011/06/if-you-see-something-say-something/" target="_blank">See Something, Say Something</a> he writes about a flight crew on United 173 being so distracted by a light bulb they flew an airplane into the ground. Other air disasters involve the same type of mistake: pilots forgetting that their job is to first fly the plane. It seems so obvious is hindsight, but it&#8217;s easy to get distracted by project goals, urgent tasks and personal goals that we forget to do the number one priority in our jobs.</p>
<p>I have to admit that I&#8217;ve gotten bogged down in trying to make a macro do some whiz-bang nifty thing while forgetting that my job is to get data requirements documented and turned into a database design. Some other things I&#8217;ve seen:</p>
<ul>
<li>A data architect that is so focused on getting a data model laid out with no crossing lines and following a no-dead-crows approach that she has forgotten more than 20 important data requirements</li>
<li>A DBA so focused on applying surrogate keys to every table that he forgot to ensure that the alternate keys were applied, therefore leaving the data in an potentially harmful situation</li>
<li>A developer so focused on squeezing performance out of a query he failed to return all the data that should have been returned in the query</li>
<li>A project manager so dedicated to getting perfect Gantt chart she forgot to actually manage the tasks and people who needed a project manager.</li>
<li>A team so focused on a two week sprint they forgot to verify a bunch of assumptions that turned out to be wrong, causing a great deal of rework and loss of confidence in the team.</li>
<li>A DBA so focused on applying a &#8220;best practice&#8221; that he applied it in a situation that it harmed performance instead of enhancing it.</li>
</ul>
<p>So what happened in Philadelphia? We don&#8217;t know. We never saw the mysterious woman deboard the plane. And when we boarded, she wasn&#8217;t there on the plane. Obviously either all four of us missed her coming back off the plane or she exited the jet way on to the tarmac. We&#8217;ll never know who she was or what she was doing on the plane or the tarmac. There has been no news of anything bad happening as a result of a failure to &#8220;fly the plane&#8221; that day. The opportunity certainly was there.</p>
<p>I&#8217;d love to hear about your examples of people who have forgotten to fly the dang plane.</p>
<p>It&#8217;s so easy to get bogged down in a short term task that we forget that our job is to do something more than that task. Make sure that you don&#8217;t lose sight of your job, no matter what it is. Every task has a purpose. Make sure you understand what the goal of a task so that you understand when you are distracted from flying the dang plane.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/fly-the-dang-plane/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Myth 1 &#8211; Normalization: Friend or Foe The Slogan</title>
		<link>http://www.dataversity.net/myth-1-normalization-friend-or-foethe-slogan/</link>
		<comments>http://www.dataversity.net/myth-1-normalization-friend-or-foethe-slogan/#comments</comments>
		<pubDate>Mon, 08 Aug 2011 18:23:59 +0000</pubDate>
		<dc:creator>Karen Lopez</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Modeling]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Information Quality]]></category>
		<category><![CDATA[Karen Lopez]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[data modeling]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[Database Modeling and Design]]></category>
		<category><![CDATA[databases]]></category>
		<category><![CDATA[Normalization]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=4798</guid>
		<description><![CDATA[by Karen Lopez This is the third post in a series on Normalization Myths.  You’ll want to read the prior posts first. Normalization Myths that Really Make Me Crazy – Introduction to a Rant Myth 1 – Normalization: Friend, Foe, or Frenemy The Survey In my survey of Twitter users on their feelings about normalization, the most common thought I received was: &#160; I realize that this is one of those sayings used to help people remember the meaning of terms.  However, like other memory tricks, it fails when it comes to working in real-world situations.  Sure, I’d like to think that one cannot be too rich or too thin, but we know that tragically, both of those thoughts can go terribly wrong. The hurts-works saying reinforces the normalization is evil concept.  It implies that normalization is harmful and must always be rolled back until it is doing less harm.  What this belief fails to recognize is that we database designers concern ourselves with normalization not just as an academic exercise, but to add value to the design by finding the right balance between data integrity and performance: Database tables are normalized to minimize the impact of update (create, modify, [...]]]></description>
				<content:encoded><![CDATA[<p>by <a href="http://www.dataversity.net/contributors/karen-lopez" target="_blank">Karen Lopez</a></p>
<p><em>This is the third post in a series on Normalization</em> <em>Myths.  You’ll want to read the prior posts first.</em></p>
<p><em><a title="Normalization Myths that Really Make Me Crazy – Introduction to a Rant" href="http://www.dataversity.net/archives/3898">Normalization Myths that Really Make Me Crazy – Introduction to a Rant</a></em><br />
<em><a title="Permanent Link to Myth 1 – Normalization: Friend, Foe, or Frenemy The Survey" href="../archives/4774" rel="bookmark">Myth 1 – Normalization: Friend, Foe, or Frenemy The Survey</a></em></p>
<p>In my survey of Twitter users on their feelings about normalization, the most common thought I received was:<em><a href="http://www.dataversity.net/wp-content/uploads/2011/07/NormalizeUntilItHurts.png"><img class="size-full wp-image-4799 aligncenter" src="http://www.dataversity.net/wp-content/uploads/2011/07/NormalizeUntilItHurts.png" alt="Normalize Until It Hurts" width="300" height="200" /></a></em></p>
<p>&nbsp;</p>
<p>I realize that this is one of those sayings used to help people remember the meaning of terms.  However, like other memory tricks, it fails when it comes to working in real-world situations.  Sure, I’d like to think that one cannot be too rich or too thin, but we know that tragically, both of those thoughts can go terribly wrong.</p>
<p>The hurts-works saying reinforces the normalization is evil concept.  It implies that normalization is harmful and must always be rolled back until it is doing less harm.  What this belief fails to recognize is that we database designers concern ourselves with normalization not just as an academic exercise, but to add value to the design by finding the right balance between data integrity and performance:</p>
<ol>
<li>Database tables are normalized to minimize the impact of update (create, modify, and delete) anomalies.  Data anomalies mean data integrity suffers.  It can also mean worse performance as data quality issues grow over time.  When data integrity suffers, business suffers.</li>
<li>Good designers use cost, benefit and risk assessment to find the right normalization level within the context of the data, its use and its quality requirements.</li>
<li>If the data is not going to be updated after being created, then it’s really difficult for update anomalies to happen.  Therefore the need for higher normal forms in the data structures is less of a requirement.  This is why most data warehouse designs are highly denormalized.</li>
<li>When a designer “tunes” a database design to include denormalization, typically they are trading off the update performance or risk of data anomalies for better performance for querying the data.  This trade off may or may not be the right design for that context.</li>
<li>When one denormalizes a data structure for performance reasons, he is borrowing performance and data integrity to get that performance gain.   This gain does not come out of thin air.</li>
<li>There is no such thing as the “right” normal form for all tables or all data.</li>
<li>One of the least successful reasons to denormalize a structure I’ve experienced is for the sole reason of making a developer’s tasks easier.  Sure, there are project benefits for ensuring that development tasks can be completely faster, but rarely do the performance gains in developer time offset the cost to data quality and query performance that happens with these types of simplifications.  Optimizing developer time at the expense of data quality and performance is an optimization of the wrong subsystem in almost all cases.</li>
<li>Normalization requires the designer to understand the meaning of the data.  It is not possible to apply the normalization rules to data you don’t understand.  Therefore, the less one understands about the data, the less likely their database design will find the right trade off of cost, benefit and risk.</li>
</ol>
<p>When someone says that we denormalize until it “works” what they really mean is they are denormalization until a query runs faster.  However, faster queries may or may not be the only goal of the database design.  We need to understand the objectives for the design in order to choose the right normalization level.  Context is everything in design.</p>
<p>I apply denormalizations to database designs on regular basis, even on transactional database designs.  I do this with the understanding of the trade-offs.  I ensure that compensating data integrity features are put in place to mitigate data anomalies.  Queries do need to perform well.  Sometimes it’s more important that the data be returned faster than it be correct.  Sometimes it’s much more important that the data be correct.  My job is to find that balance.  In order to do that, I need to understand the context of the project so that I know who my normalization friends and foes are.  I will address the overnormalization versus undernormalization issue in a future post.</p>
<p>Perhaps what the data profession needs is a series of top-up courses, to be reviewed every couple of years.  Maybe what we need is an intervention to help people understand the why normalization is even a topic in design.  I think this would make for a wonderful lunchtime presentation.  Perhaps your boss could even buy lunch.  This presentation wouldn’t be the how of normalization, but the <em>whys</em> and <em>why nots</em>.  Mastering the normal forms is fairly easy.  Understanding which one to use for a specific solution is the hard part.  The more your teammates understand the <em>whys</em>, the more likely they are to going to support your efforts.</p>
<p style="text-align: center"> <a href="http://www.dataversity.net/wp-content/uploads/2011/07/NormalQuote3.png"><img class="aligncenter size-full wp-image-4780" src="http://www.dataversity.net/wp-content/uploads/2011/07/NormalQuote3.png" alt="Normalization Quote - Autocorrect to demoralize" width="294" height="94" /></a></p>
<p>Don’t let normalization become your demoralization.  Even if that dang auto-correct keeps trying to tell you it should be.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/myth-1-normalization-friend-or-foethe-slogan/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Myth 1 &#8211; Normalization: Friend, Foe, or Frenemy The Survey</title>
		<link>http://www.dataversity.net/myth-1-normalization-friend-foe-or-frenemy-the-survey/</link>
		<comments>http://www.dataversity.net/myth-1-normalization-friend-foe-or-frenemy-the-survey/#comments</comments>
		<pubDate>Mon, 01 Aug 2011 23:30:38 +0000</pubDate>
		<dc:creator>Karen Lopez</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Modeling]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Karen Lopez]]></category>
		<category><![CDATA[data modeling]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[myths]]></category>
		<category><![CDATA[Normalization]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=4774</guid>
		<description><![CDATA[by Karen Lopez I recently blogged the introduction to this series:  Normalization Myths that Really Make Me Crazy &#8211; Introduction to a Rant.  You should check that introduction out for the background on this post.  It has caveats and warnings that you&#8217;ll need to keep handy while you read this series. I’m starting with the basis for many of the myths that I&#8217;ll be covering in this series.   Just last week I learned that Normalization is Evil.  Actually, I&#8217;ve heard that every week for the last twenty some years.   The basis of this thought is that business users never ask for data quality and always ask for better speed.  Remember that stinking pile of poo from the introduction? I think it’s still nearby Most business users don&#8217;t say &#8220;We need the data to be correct&#8221; because they expect that as a given.  Are we really going to stand before them and say “Is it okay if some, maybe more, of the sales tax remittances are wrong so that we can get better performance out of the system?” Or maybe we could wow them with “Is it okay if we calculate many of the customer bills incorrectly so that developers don’t [...]]]></description>
				<content:encoded><![CDATA[<p>by <a href="http://www.dataversity.net/contributors/karen-lopez" target="_blank">Karen Lopez</a></p>
<p><em>I recently blogged the introduction to this series:  <a title="Normalization Myths that Really Make Me Crazy – Introduction to a Rant" href="http://www.dataversity.net/archives/3898">Normalization Myths that Really Make Me Crazy &#8211; Introduction to a Rant</a>.  You should check that introduction out for the background on this post.  It has caveats and warnings that you&#8217;ll need to keep handy while you read this series.</em></p>
<p>I’m starting with the basis for many of the myths that I&#8217;ll be covering in this series.   Just last week I learned that <em>Normalization is Evil.  </em>Actually, I&#8217;ve heard that every week for the last twenty some years.   The basis of this thought is that business users never ask for data quality and always ask for better speed.  Remember that stinking pile of poo from the introduction? I think it’s still nearby</p>
<p>Most business users don&#8217;t say &#8220;We need the data to be correct&#8221; because they expect that as a given.  Are we really going to stand before them and say “Is it okay if some, maybe more, of the sales tax remittances are wrong so that we can get better performance out of the system?” Or maybe we could wow them with “Is it okay if we calculate many of the customer bills incorrectly so that developers don’t have to code as much?”</p>
<p>Think that last one is a laugh?  I was once asked to design a database that gave each developer a single table to work with. This table would hold all the data assigned to be coded by the developer during a single sprint.  This design was seen as a great way of maximizing developer productivity.  Triggers, stored procedures and other code would take care of the data integrity I was assured.  Thankfully I was able to show just how much this design would harm performance.</p>
<p>To be fair, I work with data architects and database designers who think that all denormalizations are some sort of Sign of Beast &#8212; as if there is a higher power up there watching over their designs, ready to strike them down for thinking of performance tradeoffs.   In fact, my friend Michael Swart (<a href="http://www.michaeljswart.com" target="_blank">blog </a>| <a href="http://www.twitter.com/mjswart" target="_blank">twitter</a>)  has a great illustration of this concept:</p>
<p style="text-align: center"><a href="http://www.dataversity.net/wp-content/uploads/2011/07/Codd.png"><img class="aligncenter size-full wp-image-4775" src="http://www.dataversity.net/wp-content/uploads/2011/07/Codd.png" alt="Ted Codd Hates That Thing You Just Did Cartoon " width="500" height="300" /></a></p>
<p>The Friend or Foe myth is based on the idea that you have to be for or against normalization as a concept.  I&#8217;ve seen speakers start their presentation by asking people in the audience if they are pro-normalization or anti-normalization. When people raise their hands to pro-normalization, the speaker will half-jokingly ask them to leave. The odd thing about this this for-against mindset is that it seems to indicate that one can have a data structure that has no normalization to it at all or that one can design something without a normal form.  I suppose random meaningless numbers have no normal form, but we don&#8217;t need a design for that, right?</p>
<p>This got me thinking about people who are both for you and against you: the frenemy.</p>
<blockquote><p>A &#8220;Frenemy&#8221; (alternately spelled &#8220;frienemy&#8221;) is a portmanteau of &#8220;friend&#8221; and &#8220;enemy&#8221; that can refer to either an enemy disguised as a friend or to a partner who is simultaneously a competitor and rival. [1]</p></blockquote>
<p>Could it be that normalization has to be either a friend or a foe?  Is it something that you have to choose between Team Normal and Team Denormal?  Or is it a frenemy, one of those things that you can pretend to like but have to hate when it comes down to getting things done?  I asked the Twitterverse what they felt about normalization.  The following represent the range of responses I received:</p>
<p style="text-align: center"><a href="http://www.dataversity.net/wp-content/uploads/2011/07/NormalQuotes.png"><img class="aligncenter size-full wp-image-4791" src="http://www.dataversity.net/wp-content/uploads/2011/07/NormalQuotes.png" alt="Normalization Quotes" width="610" height="708" /></a></p>
<p> Not everyone was post their own beliefs; some were quoting what they have heard in the wild. You can see that I got a variety of opinions that ranged from <em>Normalization is Evil</em> to <em>Normalization is Our Only Hope</em>.  My sample is biased because these came primarily from those who have an interest in the same topics I am interested in.  I think if I’d asked the general IT population I would have received many more negative thoughts about normalization and people who believe in it.</p>
<p>It’s common for me to be questioned when I start a project by project managers and others about my friend/foe relationship with normalization.  I’ll get questions like:</p>
<blockquote><p>PM: Do you believe in normalization?</p>
<p>Me: Yes.</p></blockquote>
<p><em>I find this one hard to respond to without giggling.  It’s as if I’m being asked if I believe in Santa Claus. Or Ted Codd.</em></p>
<blockquote><p>PM: [long pause] Okay…we’ll probably need to review your designs with the developers then.  They don’t take kindly to normalizers.</p>
<p>Me: That’s fine.  I love collaborating with the developers.  Together we can&#8230;</p>
<p>PM: [interrupting] How far do you go?</p>
<p>Me: [blushing] Um…what do you mean?</p>
<p>PM: What normal form? Third? Fifth?</p>
<p>Me: Oh. Well, it depends…</p>
<p>PM: We only go to third normal form here.  We are traditionalists.</p></blockquote>
<p><em>This always makes me wonder if there is a Church of Normalization, Reformed.  This might also mean there’s a Church of Denormalization, Unformed.</em></p>
<blockquote><p>Me: Okay.  What about First or Second Normal Form?</p>
<p>PM: Oh, we aren’t that radical.  Just Third.</p></blockquote>
<p><em>I wonder if they might have some performance issues that could be easily rectified by their need to have every data structure have the same normalization level.  I make a note.</em></p>
<p>I can tell from these interview questions that the PM thinks that normalization is his frenemy: something that someone in the IT world thinks is a “good thing” for a design, but that everyone on the project thinks is evil.</p>
<p>Our job as data architects is to help team members understand that all data structures have a normal level, that normalization is an important part of meeting business needs and that the evil parts can sometimes be exorcised.  If you need to denormalize to meet a business goal, then do it.  It&#8217;s not evil incarnate.  It&#8217;s how design works: cost, benefit and risk.</p>
<p>I keep my normalizations friends close and my enemies closer.  As for frenemies, I don’t believe in them.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><em> [1]Wikipedia contributors, &#8220;Frenemy,&#8221; Wikipedia, The Free Encyclopedia, <a href="http://en.wikipedia.org/w/index.php?title=Frenemy&amp;oldid=437999108">http://en.wikipedia.org/w/index.php?title=Frenemy&amp;oldid=437999108</a> (accessed July 27, 2011).</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/myth-1-normalization-friend-foe-or-frenemy-the-survey/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Normalization Myths that Really Make Me Crazy &#8211; Introduction to a Rant</title>
		<link>http://www.dataversity.net/normalization-myths-that-really-make-me-crazy-introduction-to-a-rant/</link>
		<comments>http://www.dataversity.net/normalization-myths-that-really-make-me-crazy-introduction-to-a-rant/#comments</comments>
		<pubDate>Wed, 15 Jun 2011 20:05:25 +0000</pubDate>
		<dc:creator>Karen Lopez</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Data Modeling]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Databases]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Karen Lopez]]></category>
		<category><![CDATA[data modeling]]></category>
		<category><![CDATA[Database Modeling and Design]]></category>
		<category><![CDATA[Normalization]]></category>
		<category><![CDATA[relational databases]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=3898</guid>
		<description><![CDATA[by Karen Lopez At Enterprise Data World I gave a Lightning Talk, Karen&#8217;s List of the Most Irritating Normalization Myths.  This was a fast-paced, auto-advancing slides presentation with 10 slides covered in five minutes.  I covered the types of things I&#8217;ve heard in design reviews that either didn&#8217;t make sense or repeated some urban data legend.  I had only thirty seconds per slide, so it was a lot to cover at an extremely rapid pace.  The blogs here at Dataversity.net give me an opportunity to get my rants down in writing, something I may regret later.  This sort of risk never stopped me before, though. What most people know about normalization they learned from word of mouth, much like most of us learned about sex &#8212; in the hallways of school, told as dirty little tales from school kid to school kid.   You probably remember just what sorts of data quality that information had. I first learned about normalization in a real database class in university, where we learned the theory, then developed solutions that worked with both normalized and highly-denormalized data structures in order to best understand the costs, benefits and risks associated with design decisions.    I was shocked [...]]]></description>
				<content:encoded><![CDATA[<p>by <a href="http://www.dataversity.net/?page_id=826" target="_blank">Karen Lopez</a></p>
<p>At <a title="Enterprise Data World 2011" href="http://edw2011.wilshireconferences.com/" target="_blank">Enterprise Data World</a> I gave a Lightning Talk, <em>Karen&#8217;s List of the Most Irritating Normalization Myth</em>s.  This was a fast-paced, auto-advancing slides presentation with 10 slides covered in five minutes.  I covered the types of things I&#8217;ve heard in design reviews that either didn&#8217;t make sense or repeated some urban data legend.  I had only thirty seconds per slide, so it was a lot to cover at an extremely rapid pace.  The blogs here at Dataversity.net give me an opportunity to get my rants down in writing, something I may regret later.  This sort of risk never stopped me before, though.</p>
<div id="attachment_3899" class="wp-caption alignleft" style="width: 310px"><a href="../wp-content/uploads/2011/06/PooArtKendra.png"><img class="size-medium wp-image-3899  " src="../wp-content/uploads/2011/06/PooArtKendra-300x211.png" alt="Poo - by Kendra Little" width="300" height="211" /></a><br />
<p class="wp-caption-text">Poo - by Kendra Little</p></div>
<p lang="en-US">What most people know about normalization they learned from word of mouth, much like most of us learned about sex &#8212; in the hallways of school, told as dirty little tales from school kid to school kid.   You probably remember just what sorts of data quality that information had.</p>
<p lang="en-US">I first learned about normalization in a real database class in university, where we learned the theory, then developed solutions that worked with both normalized and highly-denormalized data structures in order to best understand the costs, benefits and risks associated with design decisions.    I was shocked when I went into the &#8220;real&#8221; world and found out how many myths there are about what it is, how it works and how it is something to be avoided at all costs.  Normally (excuse the pun) I would not care about misinformation that my  team members spot, but when their misunderstanding of the foundation  theory of database design starts to impact my work, I have to call  BS.    This blog series focuses on the myths and poorly worded  complaints against normalization.</p>
<p>Then this week a prominent database technology expert whom I highly respect wrote a newsletter article about overnormalizing database designs.  I know for certain he knows a great deal about database design.  Most likely due to space constraints, he also perpetuated one of the myths I want to cover in this series.  I still highly respect him.  I&#8217;ve been known to make similar statements. I&#8217;d bet we all have.  Did I mention I highly respect him?</p>
<p>Writing about normalization is error prone because professionals want  to use precise terms.  A <em>relation </em>isn&#8217;t a <em>table</em>.  A <em>set </em>is not a <em>table</em>.  However, when I talk with team mates about normalization, it&#8217;s sometimes easier and more clear to them to make those sorts of analogies because people can visualize a relation as a table.  It&#8217;s still wrong, but it is more clear.  In this series, I will endeavor to be precise and still make analogies with real world artifacts such as tables and databases.  I therefore ask the professional purists to grant me some poetic license to do so.</p>
<p lang="en-US">Most people I&#8217;ve asked tell me they learned about normalization in one of these situations:</p>
<ul>
<li>In a 1-5 day course      normalization was covered right at the start of the course.  It sort of made sense, but they didn&#8217;t      really remember what the normal forms are and would have to go back to      their notes to figure out what each normal form is.</li>
<li>In a book they read about the      normal forms.  They saw how a table      was transformed from 1NF to 2NF, etc.       There was a brief explanation why normalization was important.</li>
</ul>
<ul>
<li>In a formal education course,      where there were 1 or two modules on normalization.  Students were required to show a data      structure as it was normalized from 1NF to 5NF.  There was an exam section on the normal      forms where they did a very similar exercise. They remember doing it, but      have never taken a data structure through all the normal forms again.</li>
</ul>
<ul>
<li>In a meeting they heard      someone complain about a design that was or was not properly normalized.</li>
</ul>
<ul>
<li>In a bar, where a co-worker      griped because the database design was horrible because it was over-normalized and therefore had many tables.</li>
</ul>
<p>Almost all the people I talk with pretty much equate normalization with something evil, as if data architects and database designers conspire to wedge as much normalization as they can in design just to work against developers and DBAs as much as possible.  Few (other than data professionals) understand that the main reason we are concerned about normalization in relational databases is to increase data integrity by reducing redundant data and mitigating update anomalies.  Notice how nothing in that statement speaks to query performance.  That&#8217;s because normalization is about updating data &#8211; creating, updating and deleting data.  You could, though, think of normalization as a method for increasing the performance of the data, not the code.</p>
<h2>The Normal Forms</h2>
<p>In this series I&#8217;m not going to cover normalization as a tutorial but I will share this description* from <a title="Wikipedia Database Normalization" href="http://en.wikipedia.org/w/index.php?title=Database_normalization&amp;oldid=433633646" target="_blank">Wikipedia that covers the normal forms</a>.</p>
<blockquote><p><a href="http://en.wikipedia.org/wiki/First_normal_form">First normal form</a> (1NF)<br />
Reference: Two versions: E.F.   Codd (1970), C.J. Date (2003)<a href="http://en.wikipedia.org/wiki/Database_normalization#cite_note-10">[11]</a><br />
Definition: Table faithfully   represents a <a href="http://en.wikipedia.org/wiki/Relation_%28database%29">relation</a> and has no repeating groups</p>
<p><a href="http://en.wikipedia.org/wiki/Second_normal_form">Second normal form</a> (2NF)<br />
E.F. Codd (1971)<a href="http://en.wikipedia.org/wiki/Database_normalization#cite_note-Codd.2C_E.F_1971-1">[2]</a><br />
No non-prime   attribute in the table is <a href="http://en.wikipedia.org/wiki/Functional_dependency">functionally   dependent</a> on a <a href="http://en.wikipedia.org/wiki/Proper_subset">proper   subset</a> of a <a href="http://en.wikipedia.org/wiki/Candidate_key">candidate   key</a></p>
<p><a href="http://en.wikipedia.org/wiki/Third_normal_form">Third normal form</a> (3NF)<br />
E.F. Codd (1971);<a href="http://en.wikipedia.org/wiki/Database_normalization#cite_note-Codd.2C_E.F_1971-1">[2]</a> see +also Carlo Zaniolo&#8217;s   equivalent but differently-expressed definition (1982)<a href="http://en.wikipedia.org/wiki/Database_normalization#cite_note-11">[12]<br />
Every non-prime   attribute is non-transitively dependent on every </a><a href="http://en.wikipedia.org/wiki/Candidate_key">candidate key</a> in the   table</p>
<p><a href="http://en.wikipedia.org/wiki/Boyce%E2%80%93Codd_normal_form">Boyce–Codd   normal form</a> (BCNF)<br />
Raymond F. Boyce   and E.F. Codd (1974)<a href="http://en.wikipedia.org/wiki/Database_normalization#cite_note-12">[13]</a><br />
Every non-trivial   functional dependency in the table is a dependency on a <a href="http://en.wikipedia.org/wiki/Superkey">superkey</a></p>
<p><a href="http://en.wikipedia.org/wiki/Fourth_normal_form">Fourth normal form</a> (4NF)<br />
<a href="http://en.wikipedia.org/wiki/Ronald_Fagin">Ronald Fagin</a> (1977)<a href="http://en.wikipedia.org/wiki/Database_normalization#cite_note-13">[14]</a><br />
Every non-trivial <a href="http://en.wikipedia.org/wiki/Multivalued_dependency">multivalued   dependency</a> in the table is a dependency on a superkey</p>
<p><a href="http://en.wikipedia.org/wiki/Fifth_normal_form">Fifth normal form</a> (5NF)<br />
<a href="http://en.wikipedia.org/wiki/Ronald_Fagin">Ronald Fagin</a> (1979)<a href="http://en.wikipedia.org/wiki/Database_normalization#cite_note-14">[15]</a></p>
<p>Every non-trivial <a href="http://en.wikipedia.org/wiki/Join_dependency">join dependency</a> in   the table is implied by the superkeys</p>
<p><a href="http://en.wikipedia.org/wiki/Domain/key_normal_form">Domain/key normal   form</a> (DKNF)<br />
<a href="http://en.wikipedia.org/wiki/Ronald_Fagin">Ronald Fagin</a> (1981)<a href="http://en.wikipedia.org/wiki/Database_normalization#cite_note-15">[16]</a><br />
Every constraint   on the table is a <a href="http://en.wikipedia.org/wiki/Logical_consequence">logical   consequence</a> of the table&#8217;s domain constraints and key constraint</p>
<p><a href="http://en.wikipedia.org/wiki/Sixth_normal_form">Sixth normal form</a> (6NF)<br />
<a href="http://en.wikipedia.org/wiki/Christopher_J._Date">C.J. Date</a>, <a href="http://en.wikipedia.org/wiki/Hugh_Darwen">Hugh Darwen</a>, and <a href="http://en.wikipedia.org/wiki/Nikos_Lorentzos">Nikos Lorentzos</a> (2002)<a href="http://en.wikipedia.org/wiki/Database_normalization#cite_note-Date6NF-3">[4]</a><br />
Table features no   non-trivial join dependencies at all (with reference to generalized join   operator)</p></blockquote>
<table border="0" cellspacing="0" cellpadding="0" width="1">
<tbody>
<tr>
<td></td>
</tr>
</tbody>
</table>
<p><em>I have modified the format of the table for compatibility with a variety of platforms and blog readers. </em></p>
<p>My first irritation, coming up in the next post, is about the love/hate relationship of normalization.  Your assignment is to reach deep down into your heart and identify your true feelings about normalization.   I&#8217;d love to hear about how you learned about normalization and how you came to your feelings about it.  Who says normalization is only math?</p>
<p>Your second assignment is to Tweet <em>@kendra_little</em> that you loved her poo art.   Thanks, <a href="http://www.brentozar.com/consultants/kendra-little/" target="_blank">Kendra</a>.</p>
<p><em>* Wikipedia contributors, &#8220;Database normalization,&#8221; Wikipedia, The Free Encyclopedia, </em><br />
<em><a href="http://en.wikipedia.org/w/index.php?title=Database_normalization&amp;oldid=433633646">http://en.wikipedia.org/w/index.php?title=Database_normalization&amp;oldid=433633646</a> (accessed June 14, 2011).</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/normalization-myths-that-really-make-me-crazy-introduction-to-a-rant/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Hiring Data Professionals: Mason Dixon Lines and Zombies in Your Job Postings</title>
		<link>http://www.dataversity.net/hiring-data-professionals-mason-dixon-lines-and-zombies-in-your-job-postings/</link>
		<comments>http://www.dataversity.net/hiring-data-professionals-mason-dixon-lines-and-zombies-in-your-job-postings/#comments</comments>
		<pubDate>Sat, 02 Apr 2011 19:00:45 +0000</pubDate>
		<dc:creator>Karen Lopez</dc:creator>
				<category><![CDATA[Blogs]]></category>
		<category><![CDATA[Consulting]]></category>
		<category><![CDATA[Data Modeling]]></category>
		<category><![CDATA[Data Topics]]></category>
		<category><![CDATA[Discussion]]></category>
		<category><![CDATA[Karen Lopez]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[hiring data professionals]]></category>
		<category><![CDATA[job opportunities]]></category>
		<category><![CDATA[jobs in data]]></category>
		<category><![CDATA[recruiting]]></category>
		<category><![CDATA[Zachman Framework]]></category>
		<category><![CDATA[zombies]]></category>

		<guid isPermaLink="false">http://www.dataversity.net/?p=144</guid>
		<description><![CDATA[by Karen Lopez I’m often asked to help clients fill IT professional positions – Database Administrators (DBAs), Data Architects, Data Administrators, etc.  Most of the time the job requirements are sufficient to help them find a good list of candidates worth interviewing.  However, in these times of limited budgets, I frequently see job descriptions that attempt to find the Wonder Candidate of the Century, a person so thoroughly talented that she is a master expert along an entire column of the Zachman Framework. &#160; It would seem to make sense that if you were hiring a data professional you’d design a position that fills in the Data column, right?  No?  It turns out, though, that most people don’t think and work along a column.  In my experience, people aren’t passionate about tasks that span columns from top to bottom.  They normally aren’t skilled along the whole column, either.  Referring to the Zachman Framework, what sorts of skills and passions would this candidate need: planning, architecting, designing, building systems, building parts, keeping the systems up and running.  Think about all the technologies, tools, methods and approaches these candidates would need to master to work at the professional level up and down [...]]]></description>
				<content:encoded><![CDATA[<p>by <a href="http://www.dataversity.net/?page_id=826">Karen Lopez</a></p>
<p>I’m often asked to help clients fill IT professional positions – Database Administrators (DBAs), Data Architects, Data Administrators, etc.  Most of the time the job requirements are sufficient to help them find a good list of candidates worth interviewing.  However, in these times of limited budgets, I frequently see job descriptions that attempt to find the <em>Wonder Candidate of the Century</em>, a person so thoroughly talented that she is a master expert along an entire column of the Zachman Framework.</p>
<div id="attachment_3910" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.dataversity.net/wp-content/uploads/2011/04/WonderCandidates-e1308173905486.png"><img class="size-full wp-image-3910 " src="http://www.dataversity.net/wp-content/uploads/2011/04/WonderCandidates-e1308173905486.png" alt="Wonder Candidates Mapped to Zachman Framework" width="500" height="380" /></a><p class="wp-caption-text">Wonder Candidates Mapped to Zachman Framework</p></div>
<p>&nbsp;</p>
<p>It would seem to make sense that if you were hiring a data professional you’d design a position that fills in the <em>Data</em> column, right?  No?  It turns out, though, that most people don’t think and work along a column.  In my experience, people aren’t passionate about tasks that span columns from top to bottom.  They normally aren’t skilled along the whole column, either.  Referring to the Zachman Framework, what sorts of skills and passions would this candidate need: planning, architecting, designing, building systems, building parts, keeping the systems up and running.  Think about all the technologies, tools, methods and approaches these candidates would need to master to work at the professional level up and down an entire column: business strategy tools, tactical planning tools, database technologies, design tools, data modeling tools, code generation, query performance tuning, semantic technologies, legacy database systems, query development… Well, I’m getting tired thinking about it.</p>
<div id="attachment_3911" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.dataversity.net/wp-content/uploads/2011/04/Roles.png"><img class="size-full wp-image-3911 " src="http://www.dataversity.net/wp-content/uploads/2011/04/Roles-e1308174064697.png" alt="Roles Mapped to the Zachman Framework" width="500" height="382" /></a><p class="wp-caption-text">Roles Mapped to the Zachman Framework </p></div>
<p>&nbsp;</p>
<p>I have this image when I read bad job postings of candidates being dragged by zombies up or down a column.  This dragging means we’d need to find people who have great analytical skills while at the same time having great detail-oriented building skills.  But having those skills isn’t enough; they’d also need to be passionate about every task along the column equally.  That’s the rub.  I’m not even certain I’ve met a person who is passionate about all the tasks from planning to keeping the enterprise functioning.  In fact, I think there’s a type of <a href="http://en.wikipedia.org/wiki/Mason%E2%80%93Dixon_Line" target="_blank">Mason-Dixon Line</a> somewhere around the middle rows that separates the more analytical tasks from the building and maintaining tasks.</p>
<p>&nbsp;</p>
<div id="attachment_3912" class="wp-caption aligncenter" style="width: 510px"><a href="http://www.dataversity.net/wp-content/uploads/2011/04/MasonDixonLine.png"><img class="size-full wp-image-3912" src="http://www.dataversity.net/wp-content/uploads/2011/04/MasonDixonLine-e1308174260603.png" alt="Secret Mason-Dixon Line on the Zachman Framework - Approx Location" width="500" height="380" /></a><p class="wp-caption-text">Secret Mason-Dixon Line on the Zachman Framework - Approx Location</p></div>
<p>&nbsp;</p>
<blockquote><p>“[T]he Mason–Dixon Line symbolizes a cultural boundary between the Northeastern United States and the Southern United States (Dixie).”[1]</p></blockquote>
<p>The divide I’m referring to isn’t about politics, but more about where one feels most comfortable.  At a crude level, this is a difference between thinking and designing enterprise systems or building and running enterprise systems.  The line may be higher or lower in the Framework and there may be similar lines as we cross each row and column, but I see the most pronounced difference as we move from top to bottom.</p>
<p>I see job postings all the time that call for a <em>Strategic Conceptual Enterprise Information Data Architect</em> who has strong skills in modeling, strategic planning, query tuning, XML, data syndication, mirroring, SANs, debugging and &lt; insert your favourite development languages here &gt;.   Those people don’t really exist.  There may be people who can do a lot of those things, but in my experience they aren’t passionate about all of them. New hires won’t be happy and the organization will not realize the economies that they think they will.</p>
<p>I recommend that if organizations want to combine responsibilities that they do so across the columns in the same range of rows.  Combining positions where thought processes are similar (business and data analysts, DBAs and developers, etc.).  Analysts in general make for good analysts in other columns.  Operational people tend to think operationally, builders tend to think mostly of building, not planning well.  Let’s not drag people up or down the rows.</p>
<p>Go now and check your job postings.  Do they reflect the true nature of the job?  Or are they actually full of zombies ready to drag someone to an assignment that they don’t really want?</p>
<p><em>[1]Wikipedia contributors. &#8220;Mason–Dixon Line.&#8221; Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 2 Jan. 2011. Web. 3 Jan. 2011.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dataversity.net/hiring-data-professionals-mason-dixon-lines-and-zombies-in-your-job-postings/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
