Loading...
You are here:  Home  >  Data Blogs | Information From Enterprise Leaders  >  Current Article

More Database Design Errors – Confusion with Many-to-Many Relationships

By   /  September 30, 2015  /  7 Comments

by Michael Blaha

My last blog addressed primary key and foreign key errors. Now let’s look at errors with many-to-many relationships. A many-to-many relationship is an intersection of two entities. A person may own stock in multiple companies. A company can have multiple stockholders. The combination of a person and a company stock has an attribute of the number of shares owned. A many-to-many relationship depends for its existence on the underlying entities.

Relational databases and SQL have robust support for many-to-many relationships. Referential integrity keeps relationships consistent with their underlying entities and SQL can efficiently combine them in queries. This contrasts with programming languages, most of which lack support for keeping references consistent with referents and providing easy traversal. As a result programmers have to use workarounds to handle many-to-many relationships in their code. Sometimes they make the bad decision to contaminate a database with their programming representation.

Here are some of the problems that we have encountered.

Give a Many-to-Many Relationship Identity of its Own

Consider the following figure. The most natural representation is shown at the top. A many-to-many relationship simply combines the identifiers of its underlying entities and uses them as the primary key. This is the representation that most database developers use.

Mblaha-9-30-15chart3

The middle model is more cumbersome from a database perspective but still correct. The stock_holding_ID is a single field that identifies individual records for programming convenience. The model maintains the dependence on the underlying entities and enforces that the entities combination is unique. This model meets both database and programming needs.

The bottom model is the incorrect design that we too often see. It satisfies programmers but has lost an important constraint – that the combination of Person and Company is unique. The bottom model permits multiple records for a combination of Person and Company, which was not the intent of the Stock_holding relationship.

Combine the Primary Key of one Entity with a Sequence Number

Here’s another botched approach to many-to-many relationships that we sometimes find. The relationship identifier combines the primary key of one entity with an arbitrary sequence number.

Mblaha-9-30-15chart1

There’s much to dislike about this approach. There’s an undesirable asymmetry – there is no reason to favor person as the basis for identity rather than company. Again there is no uniqueness constraint on the combination of person_id and company_id. The sequence number adds bulk to the model and database but no meaning. Briefly stated, there’s no advantage to this model and many drawbacks.

Use Parallel References

Another bad idea is to use parallel references. Programming often involves collections and some programmers try to force fit them into a database. They don’t realize how they can naturally obtain collections with SQL.

Mblaha-9-30-15chart2

An obvious problem is that the model handles at most three stock holdings for a person. We could add more references, but the model still has an arbitrary upper limit. The model is a mess if we want to compute all the persons owning a particular stock. For example, we have to check company1_ID for GE stock, company2_ID for GE stock, and company3_ID for GE stock. The SQL is lengthy, error prone, and performs poorly.

Other Misguided Approaches

We believe there are additional misguided techniques. We would like to hear comments from readers.

We did not invent the approaches presented here. Rather we found them when performing database reverse engineering. We’ve learned to never underestimate the perverse creativity of some developers. If a mistake is possible, most certainly someone will make it.

About the author

Michael Blaha is a consultant and trainer who specializes in conceiving, architecting, modeling, designing and tuning databases. He has worked with dozens of organizations around the world. Blaha has authored seven U.S. patents, seven books many articles, and two video courses. His most recent publication is the Agile Data Warehouse Design video course from Infinite Skills. He received his doctorate from Washington University in St. Louis, and is an alumnus of GE Global Research in Schenectady, New York. You can find more information with his LinkedIn profile or at superdataguy.com.

  • Kent Graziano

    I so feel your pain! Thanks for the post.

    • Michael Blaha

      Kent — we all feel the pain when we have to support this kind of nonsense.

  • itellu3times

    In real life a person may have multiple holdings of a stock that need to be tracked separately for tax purposes.

    • Michael Blaha

      Agreed. Then it’s not a many-to-many relationship. Then a holding would be an entity type in its own right. As you are observing, the example is somewhat artificial.

  • Gordon Everest

    Mike, good job of showing us the “pure” implementation of a many-to-many relationship in a relational database, and particularly on some variants which crop up with “creative” data modelers or rather application programmers. Would that everyone take an introductory database course.
    I would take a minor issue with your opening statement: “A many-to-many relationship is an intersection of two entities.” A many-to-many relationship is just that, a many-to-many relationship. Business users can understand a many-to-many relationship! It is a relational database/RDBMS which cannot “understand” a many-to-many relationship. In a relational database, which imposes 1NF on entity tables (by construction), it is necessary to REPRESENT this relationship with an intersection entity and two many-to-one relationships (represented by foreign keys). This is artificial and allows implementation considerations to encroach on a “logical” data model.

    • Michael Blaha

      Thanks Gordon for your comments. I partially agree. With a proper design a relational database captures most of the many-to-many semantics. The combination of references is unique. And referential integrity ensures that the references do exist. However, in a many-to-many relationship the related entities are peers, unlike the asymmetry in a relational DB design (which FK is listed first and which FK is listed second in the many-to-many PK).

  • Jeff

    I have a situation where I have a many to many relationship in an established but unmodeled database:

    Each person can be up to 5 races: A, B, C, D, E. Each race can have multiple people who share that race.

    It is already set up in a database with a person table with fields R1 through R5. So, for a bi-racial person they might have R1=A and R2 = B.

    Any idea how to represent this in a LDM??? I want an LDM that is useful to a business owner.

    I guess an option would have the intermediate entity be “Racial Identification” even though it isn’t a separate table?

You might also like...

StreamSets Control Hub Brings DevOps Sensibilities to Dataflows

Read More →