Normalization Myths that Really Make Me Crazy – Introduction to a Rant

by Karen Lopez

At Enterprise Data World I gave a Lightning Talk, Karen’s List of the Most Irritating Normalization Myths.  This was a fast-paced, auto-advancing slides presentation with 10 slides covered in five minutes.  I covered the types of things I’ve heard in design reviews that either didn’t make sense or repeated some urban data legend.  I had only thirty seconds per slide, so it was a lot to cover at an extremely rapid pace.  The blogs here at Dataversity.net give me an opportunity to get my rants down in writing, something I may regret later.  This sort of risk never stopped me before, though.

Poo - by Kendra Little

Poo - by Kendra Little

What most people know about normalization they learned from word of mouth, much like most of us learned about sex — in the hallways of school, told as dirty little tales from school kid to school kid.   You probably remember just what sorts of data quality that information had.

I first learned about normalization in a real database class in university, where we learned the theory, then developed solutions that worked with both normalized and highly-denormalized data structures in order to best understand the costs, benefits and risks associated with design decisions.    I was shocked when I went into the “real” world and found out how many myths there are about what it is, how it works and how it is something to be avoided at all costs.  Normally (excuse the pun) I would not care about misinformation that my team members spot, but when their misunderstanding of the foundation theory of database design starts to impact my work, I have to call BS.    This blog series focuses on the myths and poorly worded complaints against normalization.

Then this week a prominent database technology expert whom I highly respect wrote a newsletter article about overnormalizing database designs.  I know for certain he knows a great deal about database design.  Most likely due to space constraints, he also perpetuated one of the myths I want to cover in this series.  I still highly respect him.  I’ve been known to make similar statements. I’d bet we all have.  Did I mention I highly respect him?

Writing about normalization is error prone because professionals want  to use precise terms.  A relation isn’t a table.  A set is not a table.  However, when I talk with team mates about normalization, it’s sometimes easier and more clear to them to make those sorts of analogies because people can visualize a relation as a table.  It’s still wrong, but it is more clear.  In this series, I will endeavor to be precise and still make analogies with real world artifacts such as tables and databases.  I therefore ask the professional purists to grant me some poetic license to do so.

Most people I’ve asked tell me they learned about normalization in one of these situations:

  • In a 1-5 day course normalization was covered right at the start of the course.  It sort of made sense, but they didn’t really remember what the normal forms are and would have to go back to their notes to figure out what each normal form is.
  • In a book they read about the normal forms.  They saw how a table was transformed from 1NF to 2NF, etc.  There was a brief explanation why normalization was important.
  • In a formal education course, where there were 1 or two modules on normalization.  Students were required to show a data structure as it was normalized from 1NF to 5NF.  There was an exam section on the normal forms where they did a very similar exercise. They remember doing it, but have never taken a data structure through all the normal forms again.
  • In a meeting they heard someone complain about a design that was or was not properly normalized.
  • In a bar, where a co-worker griped because the database design was horrible because it was over-normalized and therefore had many tables.

Almost all the people I talk with pretty much equate normalization with something evil, as if data architects and database designers conspire to wedge as much normalization as they can in design just to work against developers and DBAs as much as possible.  Few (other than data professionals) understand that the main reason we are concerned about normalization in relational databases is to increase data integrity by reducing redundant data and mitigating update anomalies.  Notice how nothing in that statement speaks to query performance.  That’s because normalization is about updating data – creating, updating and deleting data.  You could, though, think of normalization as a method for increasing the performance of the data, not the code.

The Normal Forms

In this series I’m not going to cover normalization as a tutorial but I will share this description* from Wikipedia that covers the normal forms.

First normal form (1NF)
Reference: Two versions: E.F. Codd (1970), C.J. Date (2003)[11]
Definition: Table faithfully represents a relation and has no repeating groups

Second normal form (2NF)
E.F. Codd (1971)[2]
No non-prime attribute in the table is functionally dependent on a proper subset of a candidate key

Third normal form (3NF)
E.F. Codd (1971);[2] see +also Carlo Zaniolo’s equivalent but differently-expressed definition (1982)[12] Every non-prime attribute is non-transitively dependent on every candidate key in the table

Boyce–Codd normal form (BCNF)
Raymond F. Boyce and E.F. Codd (1974)[13]
Every non-trivial functional dependency in the table is a dependency on a superkey

Fourth normal form (4NF)
Ronald Fagin (1977)[14]
Every non-trivial multivalued dependency in the table is a dependency on a superkey

Fifth normal form (5NF)
Ronald Fagin (1979)[15]

Every non-trivial join dependency in the table is implied by the superkeys

Domain/key normal form (DKNF)
Ronald Fagin (1981)[16]
Every constraint on the table is a logical consequence of the table’s domain constraints and key constraint

Sixth normal form (6NF)
C.J. Date, Hugh Darwen, and Nikos Lorentzos (2002)[4]
Table features no non-trivial join dependencies at all (with reference to generalized join operator)

I have modified the format of the table for compatibility with a variety of platforms and blog readers.

My first irritation, coming up in the next post, is about the love/hate relationship of normalization.  Your assignment is to reach deep down into your heart and identify your true feelings about normalization.   I’d love to hear about how you learned about normalization and how you came to your feelings about it.  Who says normalization is only math?

Your second assignment is to Tweet @kendra_little that you loved her poo art.   Thanks, Kendra.

* Wikipedia contributors, “Database normalization,” Wikipedia, The Free Encyclopedia,
http://en.wikipedia.org/w/index.php?title=Database_normalization&oldid=433633646 (accessed June 14, 2011).

Related Posts Plugin for WordPress, Blogger...

Karen Lopez

Karen Lopez is Sr. Project Manager and Architect at InfoAdvisors. She has 20+ years of experience in project and data management on large, multi-project programs. Karen specializes in the practical application of data management principles. She is a frequent speaker, blogger and panelist on data quality, data governance, logical and physical modeling, data compliance, development methodologies and social issues in computing. Karen is an active user on social media and has been named one of the top 3 technology influencers by IBM Canada and one of the top 17 women in information management by Information Management Magazine. She is a Microsoft SQL Server MVP, specializing in data modeling and database design. She’s an advisor to the DAMA, International Board and a member of the Advisory Board of Zachman, International. She’s known for her slightly irreverent yet constructive opinions and rants on information technology topics. She wants you to love your data. Karen is also moderator of the InfoAdvisors Discussion Groups at www.infoadvisors.com and dm-discuss on Yahoo Groups. Follow Karen on Twitter (@datachick). 

Tags:

  5 comments for “Normalization Myths that Really Make Me Crazy – Introduction to a Rant

  1. wvu1999
    June 15, 2011 at 6:35 pm

Leave a Reply

Your email address will not be published. Required fields are marked *