Loading...
You are here:  Home  >  Data Blogs | Information From Enterprise Leaders  >  Current Article

Names, Naming, and Ontology

By   /  March 25, 2016  /  2 Comments

Click here to learn more about author Adam Pease.

Last month I looked at representing some simple statements with binary relations. This month, let’s look at what is possibly an even more fundamental issue, that of naming. It comes up often in knowledge representation and people can have very strong opinions about the “right” name for a concept.

When programmers write software they try to use descriptive names for variables.

For example:

float y,m,x,b
y = m*x + b

Here it appears that a programmer is using a simple equation for the slope and y-intercept of a line in a 2d coordinate system.

What if instead he stated:

float BobsLastName,m,x,b
BobsLastName = m*x + b

That would be rather confusing, since we wouldn’t expect a name to be a float, or be the result of arithmetic addition. But the meaning of the program will be exactly the same to the computer.

Compilers will rename variables anyway, so the program will look something like:

float var1,var2,var3,var4
var1 = var2*var3 + var4

The semantics of the program are independent of the names used. Many problems in software integration can arise from poor variable names in which people reusing software assume a certain semantics just based on a name, rather than on what the program actually does. It’s also often the case that a name cannot fully explain the semantics of a term used in a program, no matter how good it may be.

In writing taxonomies such as the Dewy decimal system, or the many current taxonomy management tools, the formal language used is very simple. All that can be said is that one thing is a specialization or generalization of another. That simplicity makes it easy to create taxonomies. But the meaning of a term consists of its label, as well as the formal statement of generalization or specialization, because the formal language used is so simple. Otherwise, a taxonomy would have very little information.

Take the example of (when indenting means a specialization relationship)

Thing

Car

Corsair

Boat

Someone who then runs across “corsair” as an article about a boat that needs to be tagged with an element of the taxonomy will then be able to put a new node under “boat”. But if a human tagged the article with the existing Thing-Car-Corsair node, there would be no way for a machine to know that was an error.

To a computer, which doesn’t understand language, the taxonomy would look akin to:

g231

g486 

g867

g223

The nodes are just arbitrary symbols, and they require human interpretation to be meaningful. When we are attempting to facilitate human to human communication, this may be sufficient, but if we want machines to help us, more is needed.

We also have the issue of human language to deal with when not in a monolingual environment. Labels that might be clear to a native English speaker may not be clear to someone who is a native speaker of Greek or Swahili, for example, and vice versa. One way to deal with this is to separate the formal terms from their labels in different languages. So we might have (g486, English:Boat, French:Bateau). Even then however, the machine can offer limited help for solving problems in creating or applying the taxonomy.

If we use a description logic like OWL we can specify a few more things that a machine can understand, because that logical language is more expressive than just being able to state that something is more specific than something else. We can state binary relationships, and restrictions on their values. For example, with might state that a Human has a father relationship with another Human and there can only be one value for that slot. If a user of the ontology attempts to add a second father value (say, to include a step-father) an OWL reasoner can find and report the error, which should help the user to understand the intended semantics of the father relation, and possibly prompt creation of a stepparent relation that could accommodate multiple entries.

Expressive power is not helpful unless it is actually used. A lazy taxonomist might neglect to include an OWL value restriction, and a user could then employ that taxonomy in a way that is inconsistent with the intent of the author. If that author builds these implicit assumptions into software that uses the ontology, that can cause bugs later, if new software is created that uses the ontology in a way consistent with its explicit semantics but different from its implicit, original intent.

We’ve seen above where a simple, formal language, a taxonomy language, can’t express the semantics that needs to be captured, so the author relies on intuitions about the term name to constrain its meaning and use. This can also be found when an author employs a more complex language like OWL where we’ve seen previously how a relation among three things – between – can cause problems.

Knowing the purpose to which a taxonomy or ontology is to be put is important for choosing and making use of a knowledge representation language.

About the author

Adam Pease is CEO and Principal Consultant of Articulate Software, which builds and advises on applications using ontology and natural language processing.  He has led research in ontology, linguistics, and formal inference, including development of the Suggested Upper Merged Ontology (SUMO), the Controlled English to Logic Translation (CELT) system, the Core Plan Representation (CPR), and the Sigma knowledge engineering environment. Sharing research under open licenses, in order to achieve the widest possible dissemination and technology transfer, has been a core element of his research program. He is the author of the book “Ontology: A Practical Guide.”

  • Richord1

    Although I am a strong advocate for using ontology and taxonomy practices, the challenges are significant when it comes to non-physical objects and non-scientific data.

    Business data unfortunately is not governed by any natural or scientific laws or principles. there is no “upper level ontology” for business data. There are numerous upper level ontology’s even in the same organization.

    Even the basic principles of data design are ignored such as naming. Names of terms are typically not derived from concepts or even a controlled vocabulary.

    Few business organizations are willing to invest in applying the rigorous principles of data literacy, semantics, taxonomy, ontology and pragmatics and few database designers have the skills and training to apply these principles. As a result the community remains data illiterate.

  • 504more

    “Dewy” as a taxonomy does not compile ; )

You might also like...

Taxonomy vs Ontology: Machine Learning Breakthroughs

Read More →