by Karen Lopez
This is the third post in a series on Normalization Myths. You’ll want to read the prior posts first.
I realize that this is one of those sayings used to help people remember the meaning of terms. However, like other memory tricks, it fails when it comes to working in real-world situations. Sure, I’d like to think that one cannot be too rich or too thin, but we know that tragically, both of those thoughts can go terribly wrong.
The hurts-works saying reinforces the normalization is evil concept. It implies that normalization is harmful and must always be rolled back until it is doing less harm. What this belief fails to recognize is that we database designers concern ourselves with normalization not just as an academic exercise, but to add value to the design by finding the right balance between data integrity and performance:
- Database tables are normalized to minimize the impact of update (create, modify, and delete) anomalies. Data anomalies mean data integrity suffers. It can also mean worse performance as data quality issues grow over time. When data integrity suffers, business suffers.
- Good designers use cost, benefit and risk assessment to find the right normalization level within the context of the data, its use and its quality requirements.
- If the data is not going to be updated after being created, then it’s really difficult for update anomalies to happen. Therefore the need for higher normal forms in the data structures is less of a requirement. This is why most data warehouse designs are highly denormalized.
- When a designer “tunes” a database design to include denormalization, typically they are trading off the update performance or risk of data anomalies for better performance for querying the data. This trade off may or may not be the right design for that context.
- When one denormalizes a data structure for performance reasons, he is borrowing performance and data integrity to get that performance gain. This gain does not come out of thin air.
- There is no such thing as the “right” normal form for all tables or all data.
- One of the least successful reasons to denormalize a structure I’ve experienced is for the sole reason of making a developer’s tasks easier. Sure, there are project benefits for ensuring that development tasks can be completely faster, but rarely do the performance gains in developer time offset the cost to data quality and query performance that happens with these types of simplifications. Optimizing developer time at the expense of data quality and performance is an optimization of the wrong subsystem in almost all cases.
- Normalization requires the designer to understand the meaning of the data. It is not possible to apply the normalization rules to data you don’t understand. Therefore, the less one understands about the data, the less likely their database design will find the right trade off of cost, benefit and risk.
When someone says that we denormalize until it “works” what they really mean is they are denormalization until a query runs faster. However, faster queries may or may not be the only goal of the database design. We need to understand the objectives for the design in order to choose the right normalization level. Context is everything in design.
I apply denormalizations to database designs on regular basis, even on transactional database designs. I do this with the understanding of the trade-offs. I ensure that compensating data integrity features are put in place to mitigate data anomalies. Queries do need to perform well. Sometimes it’s more important that the data be returned faster than it be correct. Sometimes it’s much more important that the data be correct. My job is to find that balance. In order to do that, I need to understand the context of the project so that I know who my normalization friends and foes are. I will address the overnormalization versus undernormalization issue in a future post.
Perhaps what the data profession needs is a series of top-up courses, to be reviewed every couple of years. Maybe what we need is an intervention to help people understand the why normalization is even a topic in design. I think this would make for a wonderful lunchtime presentation. Perhaps your boss could even buy lunch. This presentation wouldn’t be the how of normalization, but the whys and why nots. Mastering the normal forms is fairly easy. Understanding which one to use for a specific solution is the hard part. The more your teammates understand the whys, the more likely they are to going to support your efforts.
Don’t let normalization become your demoralization. Even if that dang auto-correct keeps trying to tell you it should be.