Click to learn more about author Michael Blaha.
The UML (Unified Modeling Language) is a popular Data Modeling notation that arose from the programming community, but it is also applicable to databases. The UML has a variety of models, one of which is the class model. The UML class model is essentially just a dialect of the ER (Entity-Relationship) approach. ER models have been used to model databases for forty years now since PP Chen’s original seminal paper.
Associations are a defining aspect of the UML. UML associations are the same as ER relationships. Associations are novel from the object-oriented perspective as few programming languages support them. Programmers tend to think in terms of their implementation as pointers and references. In contrast, associations are routine from the database perspective as most database modeling techniques take relationship support for granted.
SQL has excellent support for UML associations. You can implement them nicely using foreign keys, nulls, database joins, foreign key indexes, and referential integrity.
For an example of an association, consider a relationship between Customer and Order tables. A Customer can have many Orders; an Order is for one Customer. The UML association is bidirectional. The association can be read either way, supporting both statements.
A programming implementation would typically implement the association redundantly by burying in each end. Then the Customer data structure would contain a collection of Order references. The Order data structure would contain a Customer reference.
In contrast, the relational database implementation is asymmetric. The Order table would typically have a foreign key to the primary key of Customer. A database places a reference in only one of the tables and reconstructs either direction via joins.
Nulls become an issue when an association is optional. If there were anonymous Orders lacking a Customer, then what should we use for the Customer?
The standard database answer is to use NULLs. The standard programming answer is to use a NULL reference.
Nulls are a controversial aspect of database theory. SQL databases support one magic value called NULL. But logically, there are multiple kinds of nulls, such as not applicable, not found, and unknown. Therefore, some authors discourage the use of NULLs when designing schema, preferring to use specific special values instead. For example, the use of special values is part of established data warehouse practice.
However, you handle NULLs in schema, NULLs become inescapable when combining one table with another via database joins. Outer joins give rise to NULLs.
Database joins let you take the asymmetric implementation of relationships as foreign keys and reconstitute bidirectional access. A database join connects a foreign key to its referenced primary key. A query can then traverse the relationship in either direction.
Database joins dynamically combine data in a referencing table with data in a referent table. Database products have powerful query optimizers that implement joins efficiently. This non-procedural aspect of SQL is one of its major strengths.
Database support INNER joins where the referent must be there (and cannot be NULL). Databases also support OUTER joins where the referent can be NULL.
Programming languages do not support joins which is why most programming implementations bury associations in both ends.
Foreign Key Indexes
In general, for joins to execute efficiently, the database should have indexes on primary keys as well as foreign keys. Data warehouses are often an exception. Many database products index primary keys as a side effect of defining them.
We talked about the importance of foreign key indexes in a previous blog.
Referential integrity guarantees that a foreign key reference and its source are consistent. If you define referential integrity, the database will not permit reference and referent to be out of sync. With referential integrity, it is not possible to have dangling references. This greatly simplifies development and boosts data quality.
UML associations are the same thing as database relationships. Associations are a distinctive aspect of the UML and found in few programming languages. However, the database community takes relationships for granted. SQL has a rich implementation that deeply support the semantics of relationships.