A Brief History of Database Management

A database management system (DBMS) allows a person to organize, store, and retrieve data from a computer. It is a way of communicating with a computer’s “stored memory.” In the very early years of computers, “punch cards” were used for input, output, and data storage. Punch cards offered a fast way to enter data and retrieve it. Herman Hollerith is given credit for adapting the punch cards used for weaving looms to act as the memory for a mechanical tabulating machine, in 1890. Much later, databases came along.

Databases (or DBs) have played a very important part in the recent evolution of computers. The first computer programs were developed in the early 1950s and focused almost completely on coding languages and algorithms. At the time, computers were basically giant calculators and data (names, phone numbers) was considered the leftovers of processing information. Computers were just starting to become commercially available, and when business people started using them for real-world purposes, this leftover data suddenly became important.

Enter the DBMS. A database, as a collection of information, can be organized so a database management system can access and pull specific information. In 1960, Charles W. Bachman designed the integrated database system, the “first” DBMS. IBM, not wanting to be left out, created a database system of its own, known as IMS. Both database systems are described as the forerunners of navigational databases.

By the mid-1960s, as computers developed speed and flexibility, and started becoming popular, many kinds of general-use database systems became available. As a result, customers demanded a standard be developed, in turn leading to Bachman forming the Database Task Group. This group took responsibility for the design and standardization of a language called Common Business Oriented Language (COBOL). The Database Task Group presented this standard in 1971, which also came to be known as the “CODASYL approach.”

The CODASYL approach was a very complicated system and required substantial training. It depended on a “manual” navigation technique using a linked data set, which formed a large network. Searching for records could be accomplished by one of three techniques:

Using the primary key (also known as the CALC key)
Moving relationships (also called sets) from one record to another
Scanning all records in sequential order

Eventually, the CODASYL approach lost its popularity as simpler, easier-to-work-with systems came on the market.

Edgar Codd worked for IBM in the development of hard disk systems, and he was not happy with the lack of a search engine in the CODASYL approach, and the IMS model. He wrote a series of papers, in 1970, outlining novel ways to construct databases. His ideas eventually evolved into a paper titled A Relational Model of Data for Large Shared Data Banks, which described a new method for storing data and processing large databases. Records would not be stored in a free-form list of linked records, as in CODASYL navigational model, but instead used a “table with fixed-length records.”

IBM had invested heavily in the IMS model and wasn’t terribly interested in Codd’s ideas. Fortunately, some people who didn’t work for IBM were interested. In 1973, Michael Stonebraker and Eugene Wong (both then at UC Berkeley) made the decision to research relational database systems. The project was called INGRES (Interactive Graphics and Retrieval System) and successfully demonstrated a relational model could be efficient and practical. INGRES worked with a query language known as QUEL, in turn, pressuring IBM to develop SQL in 1974, which was more advanced (SQL became ANSI and OSI standards in 1986 and 1987). SQL quickly replaced QUEL as the more functional query language.

RDBM Systems were an efficient way to store and process structured data. Then, processing speeds got faster, and “unstructured” data (art, photographs, music, etc.) became much more commonplace. Unstructured data is both non-relational and schema-less, and relational database management systems simply were not designed to handle this kind of data.

MySQL

In 1995, David Axmark, Allan Larsson, and Michael Widenius launched MySQL, an open-sourced relational database management system (RDBMS) that is based on structured query language. According to a Stack Overflow survey in 2020, MySQL is being used by 55.6% of the survey participants, making it the most popular database in the world.

MySQL has evolved into an extremely scalable database system with the ability to operate on multiple platforms. Some key features of MySQL follow:

MySQL is very easy to deploy and use.
MySQL supports ACID (atomicity, consistency, isolation, durability), making it extremely reliable.
It is a relational database management system that has fast-loading utilities.
It can be configured using any programming language (PHP is the most popular).
It offers good security using solid data security layers and allows only authorized users to access the database with encrypted passwords.

Column Stores

A DBMS using columns is quite different from traditional relational database systems. It stores data as portions of columns, instead of as rows. The change in focus, from row to a column, lets column databases maximize their performance when large amounts of data are stored in a single column. This strength can be extended to data warehouses and CRM applications. Examples of column-style databases include Cloudera, Cassandra, and HBase (Hadoop-based).

Key-Value Stores

Key-value stores are based on Ken Thompson’s DBM (database management) research in 1979. A key-value pair database is useful for shopping cart data or storing user profiles. All access to the database is done using a primary key. Typically, there is no fixed schema or data model. The key can be identified by using a random lump of data. Key-value stores are not useful when there are complex relationships between data elements or when data needs to be queried by other than the primary key. Examples of key-value stores includes Riak, Berkeley DB, and Aerospike.

An element can be any single “named” unit of stored data that might, or might not, contain other data components.

NoSQL

NoSQL (“Not only” Structured Query Language) came about as a response to the Internet and the need for faster speed and the processing of unstructured data. Generally speaking, NoSQL databases are preferable in certain use cases to relational databases because of their speed and flexibility. The NoSQL model is non-relational and uses a “distributed” database system. This non-relational system is fast, uses an ad-hoc method of organizing data, and processes high volumes of different kinds of data.

“Not only” does it handle structured and unstructured data, it can also process unstructured big data, very quickly. The widespread use of NoSQL can be connected to the services offered by Twitter, LinkedIn, Facebook, and Google. Each of these organizations stores and processes colossal amounts of unstructured data. These are the advantages NoSQL has over SQL and RDBM Systems:

Higher scalability
A distributed computing system
Lower costs
A flexible schema
Can process unstructured and semi-structured data
Has no complex relationship

Unfortunately, NoSQL does come with some problems. Some NoSQL databases can be quite resource-intensive, demanding high RAM and CPU allocations. It can also be difficult to find tech support if your open-source NoSQL system goes down.

NoSQL Data Distribution

Hardware can fail, but NoSQL databases are designed with a distribution architecture that includes redundant backup storage of both data and function. It does this by using multiple nodes (database servers). If one, or more, of the nodes goes down, the other nodes can continue with normal operations and suffer no data loss. When used correctly, NoSQL databases can provide high performance at an extremely large scale, and never shut down. In general, there are four kinds of NoSQL databases, with each having specific qualities and characteristics.

Document Store Databases

Document stores save each record, and associated data, in a single document. Individual documents contain semi-structured data which can be queried using various query tools. A document store (often called a document-oriented database), manages, stores, and retrieves semi-structured data (also known as document-oriented information). It is often used with other non-relational database formats. Documents can be described as independent units that improve performance and make it easier to spread data across a number of servers. Document stores typically come with a powerful query engine and indexing controls that make queries fast and easy. Examples of document stores include Mongo DB and Amazon Dynamo DB.

Document-oriented databases store all information for a given “object” within the database, and each object in storage can be quite different from the others. This makes it easier for mapping objects to the database and makes document storage for web programming applications very attractive. (An “object” is a set of relationships. An article object could be related to a tag [an object], a category [another object], or a comment [another object].)

Graph Data Stores

Graph databases came about in 2006, when Tim Bernes-Lee established the concept of a large database that “linked data.” Location-aware systems, routing and dispatch systems, and social networks are the primary users of graph databases (also called graph data stores). These databases are based on graph theory and work well with data that can be displayed as graphs. They provide a very functional, cohesive picture of big data.

They differ from relational databases, and other NoSQL databases, by storing data relationships as actual relationships. This type of storage for relationship data results in fewer disconnects between an evolving schema and the actual database. It has interconnected elements, using an undetermined number of relationships between them. Examples graph databases include Neo4j, GraphBase, and Titan.

Polyglot Persistence

Polyglot Persistence is a spin-off of “polyglot programming,” a concept developed in 2006 by Neal Ford. The original idea promoted applications be written using a mix of languages, with the understanding that a specific language may solve a certain kind of problem easily, while another language would have difficulties. Different languages are suitable for tackling different problems.

Many NoSQL systems run on nodes and large clusters. This allows for significant scalability and redundant backups of data on each node. Using different technologies at each node supports a philosophy of polyglot persistence. This means “storing” data on multiple technologies with the understanding certain technologies will solve one kind of problem easily, while others will not. An application communicating with different database management technologies uses each for the best fit in achieving the end goal.

Intelligent Databases

An intelligent database manages, not simple data, but information. It presents the information in a natural way that is useful and informative. The concept was introduced in the book “Intelligent Databases” in 1989 by Mark Chignell, Setrag Khoshafian, Kamran Parsaye, and Harry Wong. They applied three levels of intelligence:

High-level tools
The user interface
The database engine

The high-level tools are used to manage Data Quality, while automatically finding relevant patterns within the data (this is, in fact, data mining). This layer often uses artificial intelligence techniques.

Use of intelligent databases is widespread, with hospital databases bringing up patient histories (charts, x-ray images, and text) with just a few mouse clicks. Corporate intelligent databases include decision-making support tools (often using sales patterns analysis).

Image used under license from Shutterstock.com

ATTEND OUR LIVE ONLINE DATA MANAGEMENT FUNDAMENTALS COURSE

Data Topics