Advertisement

When To Use the UML for Databases (And When Not To)

By on

The UML is a popular notation for modeling software artifacts. Even though the UML was mostly developed for programming, it is also relevant for databases. This article takes a critical look at using the UML for databases.

What is the UML?

The UML (an acronym for Unified Modeling Language) is a general-purpose software notation. The UML is a standard and has been sponsored by the Object Modeling Group.

The UML was created 20 years ago as an outgrowth of interest in using object-oriented technology for software engineering. Nominally, the UML was intended to address all software development efforts. But politicking and participant personal interests steered the standard mostly towards programming. Nevertheless, despite the programming bias, the UML is still quite helpful for database purposes.

At the time, there were at least two motivations for creating the UML. The first motivation was to resolve the Tower of Babel of competing notations. Several popular notations were in use, each with their strengths and weaknesses. Many of the differences among the notations were arbitrary and detrimental to working on large projects and exchanging technical ideas. The UML sought to unify and supplant these competing notations and largely succeeded at it.

A second, less noble, reason for the UML’s creation was to establish marketing buzz for vendors so that they could sell more products and services. It’s always helpful to have something new and the UML gave vendors a chance to jostle for prominence in the public eye.

The UML has a variety of diagrams, one of which (the class model) pertains to databases. The class model specifies classes and their relationships. Setting aside the hype, the UML class model is essentially a dialect of the Entity-Relationship approach that was introduced many years previously. The UML class model adds a few helpful features and some twists of notation.

When to Use the UML for Databases

The UML offers real benefits for developing database applications.

We often use the UML when gathering requirements for operational applications. Most business staff are unfamiliar with software notations. They relate better to the UML than a conventional database notation such as IE (Information Engineering). We run data modeling sessions using the UML. We show the business staff the evolving model as we solicit their input. We tell them that they are the experts with the requirements and we need their help to understand the application. We also tell them we are the database experts and need not waste their time with involvement in our job. The UML puts the focus on capturing requirements and lets us defer database details.

We also find the UML to be helpful when working with complex and abstract models. The UML is more concise than conventional notations because it omits database details. A concise notation is conducive to deep thinking. For example, we consider the Common Warehouse Metamodel (CWM) to be an excellent model and it is expressed using the UML. (See the book by John Poole et al.) As another example, CA has documented the ERwin metamodel with the UML (supportcontent.ca.com/cadocs/0/e003021e.pdf).

When Not to Use the UML

Even though we favor the UML, we try to be sensible and selective with its use.

The UML is clearly lacking in addressing database design. Some UML tools have database capabilities, but they do not have the design power of a true database tool such as ERwin or ER Studio. A conventional notation, such as IE, shows the details of database design which is helpful for generating code and supporting production maintenance.

We also forego the UML notation when working with data warehouses. Data warehouses have a simple regular structure (the bus architecture using star schema). This simplicity of data warehouse models contrasts with the complex graph of interconnected tables for many operational applications. For data warehouse applications, there is little benefit to using the UML. A conventional database notation suffices for both modeling and database design.

Sometimes, we choose a notation based on tool features. We chose ERwin for a recent enterprise data modeling project because of its polished reports and ability to import/export metadata with other tools.

In Conclusion

In practice, we often use the UML together with a database notation such as IE. The UML is a good language for the business and IE is a good language for IT. The use of two notations provides a clear demarcation between the role of the business and the role of IT.

Leave a Reply