Click here to learn more about author Adam Pease.
Last month I looked at representing some simple statements with binary relations. This month, let’s look at what is possibly an even more fundamental issue, that of naming. It comes up often in knowledge representation and people can have very strong opinions about the “right” name for a concept.
When programmers write software they try to use descriptive names for variables.
y = m*x + b
Here it appears that a programmer is using a simple equation for the slope and y-intercept of a line in a 2d coordinate system.
What if instead he stated:
BobsLastName = m*x + b
That would be rather confusing, since we wouldn’t expect a name to be a float, or be the result of arithmetic addition. But the meaning of the program will be exactly the same to the computer.
Compilers will rename variables anyway, so the program will look something like:
var1 = var2*var3 + var4
The semantics of the program are independent of the names used. Many problems in software integration can arise from poor variable names in which people reusing software assume a certain semantics just based on a name, rather than on what the program actually does. It’s also often the case that a name cannot fully explain the semantics of a term used in a program, no matter how good it may be.
In writing taxonomies such as the Dewy decimal system, or the many current taxonomy management tools, the formal language used is very simple. All that can be said is that one thing is a specialization or generalization of another. That simplicity makes it easy to create taxonomies. But the meaning of a term consists of its label, as well as the formal statement of generalization or specialization, because the formal language used is so simple. Otherwise, a taxonomy would have very little information.
Take the example of (when indenting means a specialization relationship)
GET UNLIMITED ACCESS TO 140+ ONLINE COURSES
Choose from a wide range of on-demand Data Management courses and comprehensive training programs with our professional subscription.
Someone who then runs across “corsair” as an article about a boat that needs to be tagged with an element of the taxonomy will then be able to put a new node under “boat”. But if a human tagged the article with the existing Thing-Car-Corsair node, there would be no way for a machine to know that was an error.
To a computer, which doesn’t understand language, the taxonomy would look akin to:
The nodes are just arbitrary symbols, and they require human interpretation to be meaningful. When we are attempting to facilitate human to human communication, this may be sufficient, but if we want machines to help us, more is needed.
We also have the issue of human language to deal with when not in a monolingual environment. Labels that might be clear to a native English speaker may not be clear to someone who is a native speaker of Greek or Swahili, for example, and vice versa. One way to deal with this is to separate the formal terms from their labels in different languages. So we might have (g486, English:Boat, French:Bateau). Even then however, the machine can offer limited help for solving problems in creating or applying the taxonomy.
If we use a description logic like OWL we can specify a few more things that a machine can understand, because that logical language is more expressive than just being able to state that something is more specific than something else. We can state binary relationships, and restrictions on their values. For example, with might state that a Human has a father relationship with another Human and there can only be one value for that slot. If a user of the ontology attempts to add a second father value (say, to include a step-father) an OWL reasoner can find and report the error, which should help the user to understand the intended semantics of the father relation, and possibly prompt creation of a stepparent relation that could accommodate multiple entries.
Expressive power is not helpful unless it is actually used. A lazy taxonomist might neglect to include an OWL value restriction, and a user could then employ that taxonomy in a way that is inconsistent with the intent of the author. If that author builds these implicit assumptions into software that uses the ontology, that can cause bugs later, if new software is created that uses the ontology in a way consistent with its explicit semantics but different from its implicit, original intent.
We’ve seen above where a simple, formal language, a taxonomy language, can’t express the semantics that needs to be captured, so the author relies on intuitions about the term name to constrain its meaning and use. This can also be found when an author employs a more complex language like OWL where we’ve seen previously how a relation among three things – between – can cause problems.
Knowing the purpose to which a taxonomy or ontology is to be put is important for choosing and making use of a knowledge representation language.