Given the importance of Data Modeling to the data industry over the last 50 years, it stands to reason that over time a collection of modeling frameworks and languages have garnered recognition.
Of course, Peter Chen's seminal work, inspired by Charles Bachman and others, stands at the center of the industry. The ER diagram remains an important part of Data Modeling. This author remembers his cubicle walls filled with huge ER diagrams that modeled the education workers of a large Midwestern state.
Beyond simply modeling data residing in a database, other modeling types support concepts like data flow, software application objects, information architecture and more. This article takes a look at an array of the leading information modeling frameworks, with a focus on those that have a visual aspect.
Codd, Chen and the Entity Relationship Model
In the late 1960s, E.F. Codd developed the relational model for databases. Essentially a collection of tables containing ordered elements that related to each other, the relational model provided a visual method to abstract how data is structured and stored within a database.
A few years later, Peter Chen's paper, The Entity-Relationship Model: Toward a Unified View of Data, formalized earlier work from Codd and Charles Bachman into a visual diagramming language suitable for the conceptual, physical, and logical modeling of information systems. The impact of Chen's work still resonates through the industry today.
In Chen's world, pieces of natural language get mapped to their ER diagram equivalents.
As such, three words make up the core of ER diagrams: entities, relationships, and attributes. Entities are any one thing that is uniquely identifiable; attributes are used to define the characteristics of an entity. Finally, relationships define how two entities relate to each other in the real world, sometimes described using a verb; i.e., a supervisor entity manages a department entity.
While Chen's diagramming conventions utilize shapes like rectangles to define entities and diamonds to define relationships, other conventions see wide use and are supported by common industry tools like ERwin or PowerDesigner. Even the Bachman Notation developed by Charles Bachman is still used by some today.
Crow's Foot Notation gained popularity in the 1980s because in that notation style relationships are defined by lines which allows for more information to fit onto a diagram when compared to Chen's diamond shapes. The cardinality of each relationship (one-to-one, one-to-many, etc.) is defined by the shape at the end of the relationship line.
The United States Air Force and IDEF1X
IDEF1X is a type of ER diagram language used extensively for semantic modeling. It was initially developed at the United States Air Force's Integrated Computer Aided Manufacturing (ICAM) program in the late 1970s and early 1980s to aid in the improvement of manufacturing efficiency. Both Peter Chen and Charles Bachman served in a review capacity during its development.
IDEF1X is a highly formalized diagramming language with a variety of syntaxes used to define different relationship types and other objects. It leverages a three schema approach (external, conceptual, internal) that helps to formalize the development process by making sure the proper modeling occurs at each step.
Data Structure Diagrams Provide Relationship Detail
The Data Structure Diagram (DSD) directly relates to Charles Bachman and his original article published in Data Base in 1969. Bachman diagrams are a subset of Data Structure Diagrams suitable for computer software engineering instead of database modeling.
On the surface, DSDs seem like a more complex version of an ER diagram. Rectangles still define entities and arrowed lines define relationships, but the DSD is more focused on displaying details within the relationships themselves. It allows an analyst to better see the different attributes that make up relationships with a data model.
Relationship cardinality in Data Structure Diagrams is defined by Crow's Foot notation, solid arrowheads, or the use of numeric representation.
Data Flow Diagrams Track the Movement of Data
While modeling static data is useful during all phases of a software development project, being able to visualize the flow of information throughout a system provides a way to truly understand what a system is supposed to do. Enter the Data Flow Diagram (DFD).
Edward Yourdon developed a notation style widely used in Data Flow Diagrams where processes are defined by circles, data stores by two parallel lines, and the actual flow of information by arrowed lines. Rectangles define both input and output.
DFDs provide valuable understanding throughout a project: from the high-level context diagrams useful for displaying the fundamental interactions between a system and its actors, to more detailed versions diagramming the flow of a specific function or process.
The Universal Unified Modeling Language
One of the most widely used modeling languages is the Unified Modeling Language commonly known as UML. Information Technology professionals working in the software engineering discipline leverage UML daily to model anything from business processes to use cases to software objects.
UML was developed in the mid 1990s by three software methodology experts known as the "Three Amigos" (James Rumbaugh, Grady Booch, and Ivar Jacobson) and turned over to the Object Management Group who currently maintains the language. UML saw its introduction into the industry in the late 1990s. The UML 2.0 specification was released in 2005.
UML contains thirteen distinct diagram types within three different categories: Structure Diagrams, Behavior Diagrams, and Interaction Diagrams. Class and Object Diagrams, included in the Structure category, hold the closest similarities to Chen and Bachman's Data Diagrams.
In many cases, the choice of methodology dictates which UML diagrams get used throughout a software project's lifecycle. The OMG recommends a thorough analysis of the project type before choosing a methodology. What works for a large enterprise system might not be suitable on an embedded microprocessor project.
A wide variety of modeling tools support UML, providing templates for most, if not all, the UML standard diagrams. At the minimum, anyone comfortable working with office graphics software, including Visio, should be able to conjure up any UML diagram.
A detailed series of articles covering the use of UML for data modeling and the differences between two disciplines is worth a look for interested parties.
Information Engineering Notation
Information Engineering as a formalized discipline grew out of the efforts of Clive Finkelstein and James Martin who worked together inAustraliain the late 1970s. Over time Finkelstein focused more on the business side of Enterprise Information Architecture, while Martin moved towards the technical world of rapid application development.
The notation style specific to Information Engineering holds many similarities to the IDEF1X language mentioned earlier. It tends to use Crow's Foot notation to denote cardinality between relationships as well as whether sub-items are required.
There are minor differences in IE notation between the philosophies of Finkelstein and Martin towards the practice, documented in this article. In some cases, a mixture of notation styles are used in the same diagram, like this example of a physical data model where UML is used for the entities and Information Engineering Notation is used to describe the relationships.
Despite its shared "ORM" acronym, Object-Role Modeling is a separate discipline from Object Relational Mapping. ORM is widely used for the semantic modeling of information systems at the conceptual level; it is especially optimized for non-technical users. It differs from ER diagrams by not including attributes as part of the model, instead focusing on verbalizing relationships as natural language.
The discipline developed in the 1970s from a variety of semantic research projects. Eckhard Falkenberg actually coined the term "object-role modeling" in a research paper. In recent times, Dr. Terry Halpin continues to champion the practice, serving as editor for the ORM.net website.
ORM notation uses solid ellipses to denote objects or entities, while dashed ellipses refer to values. Roles (essentially relationships or associations) are defined using distinct rectangular boxes, with each box containing a verb describing the role. Lines are used to connect entities and their roles.
Uniqueness constraints are defined with arrowed lines that span one or more roles. Placing an entire relationship set within an outer box defines it as an object itself, so it can be used with other relationships using a dashed line to tie them together. Different cyclical relationships can also be defined in the diagram.
ORM serves its niche in the world of conceptual, semantic modeling. It fosters a unique style of analysis, and is worthy of further exploration for business analysts and data modelers.
The wide array of Data Modeling languages continues to play a vital role in the Data Management and Software Engineering industries. In some cases, distinctions between the different modeling frameworks blur, but the use of notated graphics to make database and system designs easier to understand resonates through all.