Data Modeling is “the act” of creating a data model (physical, logical, conceptual etc.), and includes defining and determining the data needs of an organization, and its goals. The act of Data Modeling defines not just data elements, but also the structures they form and the relationships between them. Developing a data model requires the architects (Data Modelers) work closely with the rest of the enterprise to establish goals, and the end users of the information systems, to establish process.
A data model contains “data elements” (for example, a customer’s name, or an address, or the picture of an airplane) which are standardized and organized into patterns, allowing them to relate to each other. The programming language used has an influence on the shape of the model, as does the database being used. The model defines how data is connected, and how data is processed and stored inside the computer system. For example, a data element representing a house can be associated with other elements, which in turn, represent the color of the house, its size, address, and the owner’s name. How the information is organized varies from one model to the next.
Data Modeling, databases, and programming languages are interdependent, and have evolved together. Databases have evolved in basically four phases, and these phases tend to overlap:
- Phase I took place from roughly the 1960s to 1999, and included the development of Database Management Systems (DBMS) known as hierarchical, inverted list, network, and during the 1990s, object-oriented Database Management Systems.
- Phase II is described as relational, and introduced SQL and SQL products (plus a few nonSQL products) starting about 1990.
- Phase III supported Online Analytical Processing (OLAP), which was developed around 1990 (along with specialized DBMSs) and continues to be used today.
- Phase IV introduced NoSQL in 2008, supporting the use of Big Data, nonrelational data, graphs, and more.
In his book, Data and Reality (©1978), Bill Kent compared data models to road maps, emphasizing the differences between the real world, and the world of symbols. He wrote, “Highways are not painted red, rivers don’t have county lines running down the middle, and you can’t see contour lines on a mountain.” This observation contrasts with many researchers who attempted to create clean, mathematically sterile models. Kent preferred to emphasize the basic messiness of reality, and suggested Data Modeling architects focus on creating order out of the chaos, without distorting the basic truth. (With the popularity of NoSQL and non-relational data, Kent’s suggestions from 1978 have proven to be a good idea, but for technical reasons, it took us a while to get here.)
Data Modeling in the 1960s
The concept of Data Modeling started becoming important in the 1960s, as management information systems (MIS) became popular. (Prior to 1960, there was very little data or data storage. Computers of this time were essentially giant calculators). A variety of theoretical data models were proposed during the 60s, including three, which became a reality. The first two are “the hierarchical data model” and “the network data model.” The third theoretical model, the relational model, was proposed by Edgar F. Codd in the late 1960s.
The first true commercial database system came available in 1964, was called the Integrated Data Store (IDS), and was developed by Charles Bachman, with General Electric supporting his research. IDS used the network model, described as a flexible way of representing objects and their relationships in a graph form. IBM chose to focus on hierarchical models, designed for their Information Management System (IMS). In this model, records’ relationships take a treelike shape. While the structure is simple, it is also inflexible due to a confining “one-to-many” relationship format.
As Data Modeling and DBMSs evolved so too did programming languages. Simula was developed in 1967, and was the first object-oriented language for programming. (Other languages evolved from Simula, such as Java, Eifel, C++, and Smalltalk). The evolution of programming languages was a strong influence in shaping the models using these languages.
Data Modeling in the 1970s
In 1970, Edgar F. Codd’s ideas were published. His ideas offered a significantly different way of handling data, suggesting all data within a database could be displayed as tables using columns and rows, which would be called “relations.” These “relations” would be accessible using a non-procedural, or declarative, language. (Remember, languages influence the shape of the model, and vice versa). Rather than writing an algorithm to access data, this approach only required a file name be entered to identify the desired information. This clever idea led to much higher productivity. It was faster, more efficient, and prompted IBM to create SQL. (Originally called SEQUEL or Structured English Query Language).
Also, during this decade, G.M. Nijssen created “Natural Language Information Analysis Method” (NIAM).
Data Modeling in the 1980s
NIAM was developed further in the 1980s, with the help of Terry Halpin. Its name was changed to Object Role Modeling (ORM). ORM brought about a dramatic change in the way data is perceived and how to process the data. The traditional mindset required data and procedures be stored separately. (It should be noted, a number of techs dislike ORM because it breaks all the rules).
By the end of the 1980s, the hierarchical model was becoming outdated, with Codd’s relational model becoming the popular replacement. Query optimizers had become inexpensive enough, and sophisticated enough, for the relational model to be incorporated into the database systems of most industries. (Banks, and similar institutions, still prefer hierarchical data models for processing monetary and statistical information).
1998 and NoSQL
The original version of NoSQL is a database developed by Carlo Strozzi in 1998. He created a relational, open-source database, that “did not expose” the SQL connections, “but was still relational.” Later versions of NoSQL dropped the relational model aspects completely.
2008 to Present – the Growth of Non-Relational Models
One of NoSQL‘s advantages is its ability to store data using a schema-less, or non-relational, format Another is its huge data storage capabilities, referring to its horizontal scalability. This makes it particularly well-suited for handling unstructured data, and in turn, well-suited for processing Big Data. Rick van der Lans, an independent analyst and consultant stated:
“The Data Modeling process is always there. You can look at that role in a simple way, by thinking of it as a process that leads to a diagram. In the process of creating the diagram, you are trying to understand what the data means and how the data elements relate together. Thus, understanding is a key aspect of Data Modeling.”
Because the data is structureless, a variety of data models can be used, after the fact, to translate and map out the data, giving it structure. It is generally understood different data models, and the different languages associated with them, provide different paradigms, or different ways of looking at problems and solutions. With NoSQL, it is common to store data in a variety of locations (horizontal scalability), providing a variety of potential data model translations. This storage technique is called polyglot persistence. The question then becomes, “What is the best data model to use?” According to van der Lans:
“That’s why some call the data multi-structured, meaning that you can look at the same data from different angles. It’s as if you are using different filters when looking at the same object.”
Because of its flexibility, and large data storage capacity, NoSQL-style data stores have become popular. However, NoSQL databases still have a long way to go, in terms of evolution. According to the research report Insights in Modeling NoSQL, it was discovered many organizations haven’t included a data model into their NoSQL systems since Data Modeling with such data stores exists primarily within the actual code.
Not too surprisingly, they also found these same organizations wanted to build and use a data model, and to increase the staff having Data Modeling skills. The discrepancy is based on a lack of modelers experienced with NoSQL databases, combined with nearly no tools for NoSQL Data Modeling. The need for experienced NoSQL Data Modelers, and the appropriate tools, is still an ever-present need.
Hackolade is focused on solving these problems. They have developed a downloadable, user-friendly data model that provides powerful visual tools for NoSQL. Their software combines the simplicity of graphic data models with NoSQL document databases. This combination reduces development time, increases application quality, and lowers execution risks. The software is currently compatible with Couchbase, DynamoDB, and MongoDB schemas, and the company plans to introduce software for several other NoSQL databases.
To be sure, the desire for Data Modeling in new database models will continue to move the industry forward as more and more organizations seek to capitalize on the diversity of non-relational designs while still utilizing their time-honored and well-known Data Modeling practices.
Photo Credit: Nebuto/Shutterstock.com