Myth 1 – Normalization: Friend or Foe The Slogan

by Karen Lopez

This is the third post in a series on Normalization Myths.  You’ll want to read the prior posts first.

Normalization Myths that Really Make Me Crazy – Introduction to a Rant
Myth 1 – Normalization: Friend, Foe, or Frenemy The Survey

In my survey of Twitter users on their feelings about normalization, the most common thought I received was:Normalize Until It Hurts

 

I realize that this is one of those sayings used to help people remember the meaning of terms.  However, like other memory tricks, it fails when it comes to working in real-world situations.  Sure, I’d like to think that one cannot be too rich or too thin, but we know that tragically, both of those thoughts can go terribly wrong.

The hurts-works saying reinforces the normalization is evil concept.  It implies that normalization is harmful and must always be rolled back until it is doing less harm.  What this belief fails to recognize is that we database designers concern ourselves with normalization not just as an academic exercise, but to add value to the design by finding the right balance between data integrity and performance:

  1. Database tables are normalized to minimize the impact of update (create, modify, and delete) anomalies.  Data anomalies mean data integrity suffers.  It can also mean worse performance as data quality issues grow over time.  When data integrity suffers, business suffers.
  2. Good designers use cost, benefit and risk assessment to find the right normalization level within the context of the data, its use and its quality requirements.
  3. If the data is not going to be updated after being created, then it’s really difficult for update anomalies to happen.  Therefore the need for higher normal forms in the data structures is less of a requirement.  This is why most data warehouse designs are highly denormalized.
  4. When a designer “tunes” a database design to include denormalization, typically they are trading off the update performance or risk of data anomalies for better performance for querying the data.  This trade off may or may not be the right design for that context.
  5. When one denormalizes a data structure for performance reasons, he is borrowing performance and data integrity to get that performance gain.   This gain does not come out of thin air.
  6. There is no such thing as the “right” normal form for all tables or all data.
  7. One of the least successful reasons to denormalize a structure I’ve experienced is for the sole reason of making a developer’s tasks easier.  Sure, there are project benefits for ensuring that development tasks can be completely faster, but rarely do the performance gains in developer time offset the cost to data quality and query performance that happens with these types of simplifications.  Optimizing developer time at the expense of data quality and performance is an optimization of the wrong subsystem in almost all cases.
  8. Normalization requires the designer to understand the meaning of the data.  It is not possible to apply the normalization rules to data you don’t understand.  Therefore, the less one understands about the data, the less likely their database design will find the right trade off of cost, benefit and risk.

When someone says that we denormalize until it “works” what they really mean is they are denormalization until a query runs faster.  However, faster queries may or may not be the only goal of the database design.  We need to understand the objectives for the design in order to choose the right normalization level.  Context is everything in design.

I apply denormalizations to database designs on regular basis, even on transactional database designs.  I do this with the understanding of the trade-offs.  I ensure that compensating data integrity features are put in place to mitigate data anomalies.  Queries do need to perform well.  Sometimes it’s more important that the data be returned faster than it be correct.  Sometimes it’s much more important that the data be correct.  My job is to find that balance.  In order to do that, I need to understand the context of the project so that I know who my normalization friends and foes are.  I will address the overnormalization versus undernormalization issue in a future post.

Perhaps what the data profession needs is a series of top-up courses, to be reviewed every couple of years.  Maybe what we need is an intervention to help people understand the why normalization is even a topic in design.  I think this would make for a wonderful lunchtime presentation.  Perhaps your boss could even buy lunch.  This presentation wouldn’t be the how of normalization, but the whys and why nots.  Mastering the normal forms is fairly easy.  Understanding which one to use for a specific solution is the hard part.  The more your teammates understand the whys, the more likely they are to going to support your efforts.

 Normalization Quote - Autocorrect to demoralize

Don’t let normalization become your demoralization.  Even if that dang auto-correct keeps trying to tell you it should be.

Related Posts Plugin for WordPress, Blogger...

Karen Lopez

Karen Lopez is Sr. Project Manager and Architect at InfoAdvisors. She has 20+ years of experience in project and data management on large, multi-project programs. Karen specializes in the practical application of data management principles. She is a frequent speaker, blogger and panelist on data quality, data governance, logical and physical modeling, data compliance, development methodologies and social issues in computing. Karen is an active user on social media and has been named one of the top 3 technology influencers by IBM Canada and one of the top 17 women in information management by Information Management Magazine. She is a Microsoft SQL Server MVP, specializing in data modeling and database design. She’s an advisor to the DAMA, International Board and a member of the Advisory Board of Zachman, International. She’s known for her slightly irreverent yet constructive opinions and rants on information technology topics. She wants you to love your data. Karen is also moderator of the InfoAdvisors Discussion Groups at www.infoadvisors.com and dm-discuss on Yahoo Groups. Follow Karen on Twitter (@datachick). 

Tags:

  2 comments for “Myth 1 – Normalization: Friend or Foe The Slogan

  1. Judith Astroff
    February 14, 2012 at 9:12 am

    A very timely post for me, as I’m designing for an expansion of an existing system, and pressure from the developers is to denormalize. Some of it is valid. However, volumne in this case is fairly trivial, but errors are highly visible, so I’m proceding carefully, making sure that accuracy comes first.

Leave a Reply

Your email address will not be published. Required fields are marked *

Add video comment