Loading...
You are here:  Home  >  Data Blogs | Information From Enterprise Leaders  >  Current Article

Knowledge Maps: Structure Versus Meaning

By   /  October 18, 2017  /  4 Comments

Click to learn more about author John Singer.

We are at an inflection point in the process of designing and building computerized systems.  Let’s be honest – even with the explosion of new data technologies we are still building the same unit record processing systems from the 60’s, although with a bit more sophistication than an 80-column punch card.  You could take most relational database designs, dump each table to a tape, hang them all on tape drives and get about the same results.  This seems harsh (and maybe a bit of an exaggeration) but our systems are as dumb as a box of rocks in terms of the meaning of the data and focus mostly on reading and writing records and presenting data to the user on a screen.

When we started using relational databases, we adopted the Entity-Relationship Modeling approach to design.  We were going to design databases based on what the data means with tables representing real world objects rather than cramming as many fields into one record as possible.  Chris Date talked about “what, not how” – meaning that SQL should specify what data you want, not how programmatically the database should retrieve it.  These were exciting times.

Database design adopted Entity-Relationship Modeling methodologies and used a normalization design process that stressed reducing data redundancy while increasing the flexibility and ease of data access.  If the data is normalized we can produce any output the user desires!  While this is true and works well, it has focused the design discussions on table “structures” with data meaning pushed to the side.

Data models are traditionally divided into 3 “layers” – conceptual, logical, and physical:

  • Conceptual: Conceptual models in theory were meant to describe the entities in business terms (i.e. what the data means) but in practice the conceptual model was defined more as a limited scope logical model with such directives as “you don’t need to resolve the many to many relationships” or “don’t include technical attributes”. Rarely has an organization produced and sustained a useful conceptual model of an application database.  I won’t even mention “enterprise conceptual” models.
  • Logical: The logical model is meant to expand on the conceptual model, filling in needed keys, constraints and attributes while maintaining third normal form and remaining database neutral. This is a useful exercise but typically is used only as a pre-cursor to what everyone really wants – the physical design. Organizations are moderately successful in maintaining logical models.
  • Physical: The physical model is the database dependent implementation of the logical model with some denormalizations introduced as needed. Relational databases have improved so much in terms of performance, that many database designs are straight implementations of the logical.

While I believe that today’s application system database designs are a great improvement over flat file systems (I was just kidding about that tape drive remark), I am still disappointed that this technology hasn’t delivered on the greater promise of our databases being a model of reality.  Object Oriented programming made the exact same claim (design objects to represent real world things) but also failed to deliver.  Complex hierarchies of classes are really about program structure (i.e. Model-View-Controller) with data classes relegated to a persistence framework.  End users can’t make sense of a UML class diagram any more than they can an ERD.

The reality is that ERD’s and class diagrams are far more about structure than meaning.

Back in 2006 I delivered a presentation at the DAMA Symposium entitled “Bridging The Gap – Adding Semantic Awareness To Today’s Application Systems”.  You can find the presentation here: http://www.singerlinks.com/presentations/

In this presentation, I noted several technologies that seemed to be merging towards a more semantically aware computing stack:

  1. Enterprise Content Management (ECM) servers with emphasis on taxonomy/thesaurus to manage the vocabularies used to search the content
  2. Semantic Web servers built around the W3C standards based on logic and ontology

What surprised me at the time was how little each of these worlds was aware of each other within the organization.  Furthermore, traditional N-Tier application and database developers were completely unaware of either ECM or Semantics and I’m not sure any of this has changed in the intervening 11 years.

What we are seeing now is a revolution in AI where some very difficult problems are gaining traction with real world implementable solutions (voice and image recognition, natural language processing, machine learning).

When Google introduced Knowledge Graphs they coined the phrase “Things not Strings”.  This simple three-word phrase captures the essence of both the problem and the solution.

  • Problem: (strings) We store data as bits conforming to some datatype organized into a structure.
  • Solution: (things) We need to store and process data like humans do – as a network of inter-related concepts.

This signifies a shift in computing away from managing structures of data towards managing data based on its meaning. 

I would like to expand this thought (things not strings) to “Meaning not Structure” and I believe this transition will be the driving force behind IT for decades to come.  Data modeling will no longer be about how to organize strings into normalized structures, but how to define things in terms of their classifications, categorizations, descriptive properties and most importantly their relationships to other things.  I highly recommend Thomas Frisendal’s book “Graph Data Modeling for NoSQL and SQL” which goes into detail on the history of data modeling and the transition to meaning versus structure modeling.

All this begs the question of what exactly is “meaning” and how can software “know what something means”.  The answer to these questions is coming from the relatively new field of Cognitive Science.

From Wikipedia:

Cognitive science is the interdisciplinary, scientific study of the mind and its processes. It examines the nature, the tasks, and the functions of cognition. Cognitive scientists study intelligence and behavior, with a focus on how nervous systems represent, process, and transform information.

Cognitive science seeks to determine how humans understand the world around them and communicate this understanding with others.  This is a “renaissance” that is combining scientific research from a number of fields:

  • a behavioral understanding of how we think and communicate
  • a biological understanding of how our brain and body functions
  • a technical understanding of how language encodes knowledge

Our goal, and the IT transition that is just now beginning, is to organize data in the computer the way humans organize data in their minds, and to create software that mimic’s a person’s ability to reason about, communicate about, and act upon that understanding.  After all, millions of years of evolution can’t be wrong. Why shouldn’t we build systems that mimic the way we work.

Knowledge Maps (or Knowledge Graphs) represent a starting point for practitioners interested in this new approach.  The knowledge graph uses a simple linguistic model (subject – verb – object) to represent propositions about the world.  We can easily model the world in terms of the “things” we are interested in and how they inter-relate.  Using a basic ETL approach, we can load data from source systems into a graph database like Neo4j and build a simple linguistic model of the facts that interest us.  These graphs can act as a “map” to the underlying detail used to create them, providing faceted search and guided queries.  Stay tuned to this blog as we get into some “how-to” specifics on building Knowledge Maps.

 

The other articles in this series on Knowledge Maps are:

Knowledge Maps – How to Ask a Good Data Question

Knowledge Maps – A New Model for Metadata

Knowledge Maps – Connecting the Dots

Knowledge Maps – What is the Problem We are Trying to Solve?

About the author

John Singer, Principal Consultant, SingerLinks Consulting John is a 36 year IT veteran working in a number of data-oriented roles (DBA, Data Modeler, ETL, MDM, ITIL CMDB, Metadata Management, BI Reporting and Information Center). His most recent experience is in building Knowledge Maps utilizing the Neo4j graph database that solve complex problems in the ITIL CMDB, Network Management, Java and Oracle Metadata Management space. You can follow John and SingerLinks Consulting at: LinkedIn

  • Daniel Garigan

    “The reality is that ERD’s and class diagrams are far more about structure than meaning”. This may be a reality in practice but not in the relational model. The ERD does not exit unto itself; It is produced by the analyst in the rendition of a method. Rather than be focused on the technology, the meaning is the focus, agenda and test of skill of an analyst. The simple challenge is to use the tools to represent the meaningful model; it does not happen all by itself. An ERD should be a work of art produced by the analyst in their ability to capture clarity and precision in the definition of a relation and the scope of relations to eachother. Relational technology is not a product but methodical approach to set theory. Yes, databases I see are very poorly conceived… very complicated, overloaded entities, obscure and convoluted or simply of little relationship that cannot reflect its meaning…. that is not a technology issue. The challenge is adequately and with precision carry out the analysis; then the technology works… very precisely.

  • Daniel Garigan

    The mentioned division of the conceptual, logical, and physical is a point of a non-sequitur. I would invert your 1 Conceptual and 2, to the effect that …
    FIRST we need the LOGICAL to define our words (logical data dictionary). The CONCEPTUAL model is what is produced using (and depending on) precise and unambiguous Language. (This is an essential and NOT a trivial exercise)….. THEN we can proceed to define the entities and make our statements (CONCEPTUAL). For the analyst, CONCEPTUAL is not their own, not a first person rendition of meaning… it is an agenda for a crystal clear understanding and agreement of that understanding of the “other”; those would be all the stakeholders of an information system agenda. The issues of PHYSICAL models reflect utility and abridgement of meanings… it is strictly a technical performance issue. So…. the inversion would be:
    1.LOGICAL data definition is capture of the unambiguous meanings to be rendered into entity/tuples or relations
    2. CONCEPTUAL (product of the Analyst) is a capture and rendering of the conceptual (business function into comprehensible MODELS in stakeholder semantics) such that it flows easily in the mind of those who are involved in the data / operations agenda. It is a rendition of the concepts of stakeholders in a cute and precise way.
    3.The PHYSICAL’ an implementation issue to optimize performance. In the technology agenda the Physical is an Optimization of the Conceptual and Performance.

  • Gordon Everest

    Subject – verb – object… subjects and objects are both represented by nouns. Together they make a simple sentence, an assertion. That is exactly what fact (oriented) modeling is all about — elementary fact sentences are the bedrock of schemes such as ORM (Object Role Modeling, see Terry Halpin). Elementary fact sentences have one predicate (verb phrase) and one or more objects (nouns). Each sentence establishes a relationship — two objects = a binary relationship, three objects = a ternary relationship, and so on. Note that the relational model for data is limited to at most 1:Many (due to the limitation of first normal form), and binary relationships. Those limitations do not exist in ORM. These elementary sentences form a data structure (logical?… but ALL data structures are “logical”). Then we add modifiers to these sentences, which become like integrity constraints, e.g., must (dependent/required) or may (optional), at most one or more than one (many), etc. Of course all this depends on the definition of all the objects — that is where the “meaning” comes in. Note that the label for an object could represent a population of object instances, like “Employee” or an individual instance, like “John Doe.” If an individual instance, we need to generalize to the type of thing the individual is an instance of.
    I recommend an examination of fact oriented modeling such as ORM with its rigor and completeness in modeling (the things in) some user domain, and the relationships and constraints/business rules on and among those (populations of) things. This is perhaps where fact modeling differs from “knowledge maps.” In ORM all nouns designate object types, ie., populations of instances which are (somewhat arbitrarily but with purpose) clustered together in developing a model. That is not always true in knowledge maps which more generally model ideas, concepts, activities, processes,… rather than (populations of) things. In addition, knowledge maps model where this information is stored and the flow of information among those stores. As in fact modeling, knowledge maps are still dependent upon the definition of things, concepts, stores, activities, etc. That is the basis for meaning in either scheme.

  • Milovan Banicevic

    Integrating
    intelligence into information process requires an another dimension to be added to a structured data storage and retrieval engines.

    Basic ingredient for any process with intelligent means is based on ability to recognise and classifies object properties and existing dependencies between objects.

    Recognition and subsequent classification are major elements of any intelligent data processing system. Current ER design process does not
    include data classification. Data Classification is a missing dimension. Best
    known approach for data classification are applicable by taxonomy methods.

    Once we develop a design method that associates data classification
    (missing layer in ER design) and introduce datum as an information element we will be able to elevate data and information processes to another level.

    Such extended design requires necessary changes in implementation of PK/FK for unit of information (record/row). The complexity of one-to-many
    relationship in existing ER designs is placed on information level. We should consider shifting one-to-many relationships to more appropriate role. Instead of describing relation between two data sets (tables), more logical position will be to describe relationship between datum
    (taxonomy element) and its position within single unit of information
    (table/row).

You might also like...

Using Your Data Corpus to Further Digital Transformation

Read More →