You are here:  Home  >  Data Blogs | Information From Enterprise Leaders  >  Current Article

More Database Design Errors – Confusion with Many-to-Many Relationships

By   /  September 30, 2015  /  No Comments

by Michael Blaha

My last blog addressed primary key and foreign key errors. Now let’s look at errors with many-to-many relationships. A many-to-many relationship is an intersection of two entities. A person may own stock in multiple companies. A company can have multiple stockholders. The combination of a person and a company stock has an attribute of the number of shares owned. A many-to-many relationship depends for its existence on the underlying entities.

Relational databases and SQL have robust support for many-to-many relationships. Referential integrity keeps relationships consistent with their underlying entities and SQL can efficiently combine them in queries. This contrasts with programming languages, most of which lack support for keeping references consistent with referents and providing easy traversal. As a result programmers have to use workarounds to handle many-to-many relationships in their code. Sometimes they make the bad decision to contaminate a database with their programming representation.

Here are some of the problems that we have encountered.

Give a Many-to-Many Relationship Identity of its Own

Consider the following figure. The most natural representation is shown at the top. A many-to-many relationship simply combines the identifiers of its underlying entities and uses them as the primary key. This is the representation that most database developers use.


The middle model is more cumbersome from a database perspective but still correct. The stock_holding_ID is a single field that identifies individual records for programming convenience. The model maintains the dependence on the underlying entities and enforces that the entities combination is unique. This model meets both database and programming needs.

The bottom model is the incorrect design that we too often see. It satisfies programmers but has lost an important constraint – that the combination of Person and Company is unique. The bottom model permits multiple records for a combination of Person and Company, which was not the intent of the Stock_holding relationship.

Combine the Primary Key of one Entity with a Sequence Number

Here’s another botched approach to many-to-many relationships that we sometimes find. The relationship identifier combines the primary key of one entity with an arbitrary sequence number.


There’s much to dislike about this approach. There’s an undesirable asymmetry – there is no reason to favor person as the basis for identity rather than company. Again there is no uniqueness constraint on the combination of person_id and company_id. The sequence number adds bulk to the model and database but no meaning. Briefly stated, there’s no advantage to this model and many drawbacks.

Use Parallel References

Another bad idea is to use parallel references. Programming often involves collections and some programmers try to force fit them into a database. They don’t realize how they can naturally obtain collections with SQL.


An obvious problem is that the model handles at most three stock holdings for a person. We could add more references, but the model still has an arbitrary upper limit. The model is a mess if we want to compute all the persons owning a particular stock. For example, we have to check company1_ID for GE stock, company2_ID for GE stock, and company3_ID for GE stock. The SQL is lengthy, error prone, and performs poorly.

Other Misguided Approaches

We believe there are additional misguided techniques. We would like to hear comments from readers.

We did not invent the approaches presented here. Rather we found them when performing database reverse engineering. We’ve learned to never underestimate the perverse creativity of some developers. If a mistake is possible, most certainly someone will make it.

About the author

Michael Blaha is a consultant and trainer who specializes in conceiving, architecting, modeling, designing and tuning databases. He has worked with dozens of organizations around the world. Blaha has authored seven U.S. patents, seven books many articles, and two video courses. His most recent publication is the Agile Data Warehouse Design video course from Infinite Skills. He received his doctorate from Washington University in St. Louis, and is an alumnus of GE Global Research in Schenectady, New York. You can find more information with his LinkedIn profile or at superdataguy.com.

You might also like...

Real-time Data Processing with Serverless Computing

Read More →