Loading...
You are here:  Home  >  Data Blogs | Information From Enterprise Leaders  >  Current Article

Say No More: Verbal Data Models

By   /  November 20, 2017  /  3 Comments

Click to learn more about author Thomas Frisendal.

”Nudge nudge, wink wink, say no more, say no more”. Says British Eric Idle in the third Monty Python’s Flying Circus episode, “How to Recognise Different Types of Trees From Quite a Long Way Away” from 1969.

Indeed, it should not be necessary to say more, once you have gotten your data model together.

Or is it?

What should we expect from a data model? According to Wikipedia:

“A data model (or datamodel) is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world entities”.

Alright, here is a set of elements of data:

Clearly, something is not spoken out. Nudge, nudge? Yes, “how they relate to each other” is missing.

Maybe a proper data model will help us. Here is a snip of a data model describing a fictitious corporation, Northwind:

Northwind was Microsoft’s sample database for a number of years and can still be found on Codeplex. Copyright © 2006-2017 Microsoft Corporation. The diagram is produced by the author using Oracle Data Modeler.

Let us look at a scenario to check how much help the data model can give us:

We need to answer a C-level exec, who wants to know: “Which employee had the highest cross-selling count of “chocolate” product and which other product was it?”

So we need to somehow access these data:

  • Orders and Order_Details concerning Product with the name “Chocolate” and find out, who of the Employees were involved in those orders.
  • Find the Order_Details of all other Products on those same orders, which were not the “Chocolate” product.
  • Count the number of other Products distinct across those same Orders and Employees.
  • Return the top 5 Employee – Product – count combinations.

We know that the relationships in the diagram, are not named, which is a pity, because we need to say which relationship, we are referring to. Many data modelers have adopted a standard of using the two table names in combination as the name of the foreign key constraint. This works OK, as long as you do not have more than one relationship between the two tables. But you do see them, quite often.

But in reality this means that we could have two relationships between Customers and Orders. As far as SQL is concerned all that matters is that the constraint names are unique. Constraint names are not really externalized in any tool that I have seen. And they are not used in queries.

Wouldn’t it be nicer, if the query facility knew about relationships directly, knowing that they have some things in each end of the relationship?

It is time for a “wink, wink”, because information which is unsaid in the data model above. Somebody has to say more. That somebody is the data modeler / business analyst. Speak out the previously unspeakable: What are the names, and henceforth the meaning, of those relationships? (Really a sensible question, if you ask me).

Let us try a different approach. Instead of focusing on defining tables and constraints, what about starting with defining the business concepts (object types) and their relationships first:

The drawing above is a concept map, and in concept mapping you name the relationships. In fact, relationships at this level are most often action verbs, because the relationships reflect some business process or flow or composition of concepts:

  • Products ordered are on Order Details
  • Orders consist of Order Details
  • Employees handle Orders

This resembles the subject->predicate->object thinking of the semantic standards (RDF etc.).

Concept Maps originated in the psychology of learning (Prof. Joseph Novak and others). These researchers called the relationships “linking phrases”. Very appropriate.

A concept map is not quite semantics, but it is full of business meaning, and it is definitely more meaningful than data, which are just defined in SQL.

Aside: Besides the semantic community there is one community of modelers, which go to this level of detail, and that is the fact modelers. Praise them for that. End of aside.

If you need to supplement your concepts with their properties you can do so as a property graph (a snip of a model):

The named relationships enable us to read the text of the query directly from the data model.

In the graph query languages on the market today, you start with defining the path that you want to traverse in the graph.

The notation below is sort of in “pseudo-Cypher”. Cypher (invented by Neo4J) is the most widely used graph query language. In the examples below, I have highlighted the relationships, for pedagogical reasons:

  • First we want all Chocolate products:

Choose Products where ProductName is ‘Chocolate’, name the result set “Choc”

  • Then we can proceed to the Orders containing those products and the Employees, who handled those orders:

“Choc” – ordered – Order Details – consist of – Orders – handle – Employees

  • Looking for which other products those Employees handled is simply a continuation working from the result set from 2) just above:

Employees – handle – Orders – consist of – Order Details – ordered – Product

  • So, now we have the set of orders, we are interested in, and the rest is a matter of aggregation.

This way of describing a path in a graph is the best practice way to do it in graph databases.

OK – but my data are not organized as graphs? The answer to that is: All data are graphs! And they can be presented as graphs. That is precisely what is going on in the very popular GraphQL community – check it out. Using GraphQL, your good old data in SQL can be made good to go as enterprise data graphs. You just need to define the relationships. And map them to GraphQL APIs.

In conclusion: Don’t ever leave home without named relationships! I cannot emphasize it enough – this is important!

Your data model must be verbal. Not only nouns, but also verbs. Verbs are actions and make things happen. So, having named relationships in the data model is key to enabling meaningful queries. They are business-friendly and easy to define and to read. Machine learning cannot name relationships, it can just detect them. You have work to do!

Name the relationships and do it using the business action verbs relevant to the reason for those object types to exist.

If you do it like this, say no more!

About the author

Thomas Frisendal, Owner and database consultant for TF Informatik Thomas Frisendal is an experienced database consultant with more than 30 years on the IT vendor side and as an independent consultant. He has worked with databases and data modeling since the late 70s; since 1995 primarily on data warehouse projects. He has a strong urge to visualize everything as graphs - even datamodels. His area of excellence lies within the art of turning data into information and knowledge. A fresh approach to information-driven analysis and design, which is "New Nordic" in the sense that it represents the Nordic values such as superior quality, functionality, reliability and innovation. In short: New ways of communicating the structure and meaning of the business context. Thomas is an active writer (see more about his two books on the website referenced below) and speaker. He works as a Data Architect and Modeler out of Copenhagen, Denmark. He owns the consultancy firm TF Informatik, registered in Denmark.

  • Steven

    There are a number of things that are not quite what they should be in this description.

    First: let’s talk about information, so data with meaning (semantics). It is strange that we are still talking about data, while we do not want anything to do with data that has no meaning in the context the universe we are discussing.

    Then let’s look at the difference between a conceptual information model and a datamodel as described. One of the agreements we had some decades ago was that information modelling is very close to what is language and what is grammar. As such it is, at least ought to be, very easy to form sentences of what we model, and thus creating language. In practice we see this language is very language oriented, and ought to be tranformed into a “Story”, but in principal this transformation is pretty easy.

    Than a quality of modelling issue. Simple rule: as soon as facttypes (in some environments called attribuuttypes) are combined (see the UML model in this article) in what are called entitytypes or even tables we have a dimished level of quality in a conceptual information model. The reason is simple: while you can still add all kinds of constructs (rg constrainttype) to individual definitions you cannot do that when they are already combined. This is the reason why you can transform easily from a conceptual information model to a datamodel, but it is very difficult to go back. Reason: loss of meaning/semantics.

    So my opinion is: let’s start conceptual information modeling inside the organisation and, definitely, outside IT. Such an information model in fact expresses the knowledge of the information an organisation needs, and, if done well, it is pretty easy to get to a high quality datamodel. It is also quite easy to point out where a datamodel needs to change if the information of an organisation changes. In practice this means a real improvement of the quality of IT-systems, where we know beforehand where we have to change IT instead of during or after we have done some projects on it.

    • Thomas Frisendal

      Hi Steven, I am of the firm opinion that the days of one size fits all in data modeling are over. I acknowledge the power of fact modeling, and there are clearly use cases, where it is the best alternative. However, for many, “daily bread and butter” applications something more simple is appropriate. And that is what I try to get across. Some data modeling is better than none. And relationship naming gets you a long way in understanding the semantics of the business problem.

      • Steven

        Hi Thomas,

        Thank you for this answer.

        It is not as much as one size fits all, my experience with modeling information puts conceptual information modeling in a quite different crowd. Modeling conceptually means at demand side, so where the people are who need certain data, their information, to be able to do their work. There thing have to be broad because they live in their world.

        The problem is that IT should fit to this world. We have been making and buying applications, nearly, per application. IT is looking afterwards whether these may fit, and have a difficult time to make them interact by creating sometimes crazy interfaces and API’s. Having a clear knowledge of the information at demand side will enable developers etc. to create fitting solution, and for this reason one cannot model per application. I;m sure you are right it is much faster to do modelling per project, but the time following these applications is extremely expensive and places users in the position where they have to coop with applications that do not really fit together.

        I really like your opinion to do data modeling instead of just talking about functionality. The spending on IT is about 20% in projects and 80% in the management of solutions, and this relative spending is becoming worse. By doing the right kind of modeling one can get to a better ratio. Independent of bread and butter or specialized, because all have to fit to the environment they are used in.

        On use cases: in IT they are usually system use cases, where the use of a proposed system is modeled. There are also business use cases. They model the environment of such a system, and show what such an environment needs to be able to work well. The conceptual information modeling belongs in the world where business use cases are made for, while data modeling is in the world of system use case modeling.

        Relationship naming is an integral part of conceptual information modeling, as are the constrainttypes. That is the reason why I made the quality comment: don’t combine (named) relationship (called facttypes). This naming is even two-sided, where noun in sentences are recognisable. Thus you can form sentences and stories based on a conceptual information model.

You might also like...

To Get Value from Data, Organizations Should Also Focus on Data Flow

Read More →