CONSIDERING A CAREER IN DATA MANAGEMENT?
Learn about the key responsibilities you’ll have and the skills and education you’ll need with our online training program.
First came relational databases, which provide a useful comparison for understanding non-relational databases. Invented by Edgar F. Codd in 1970, the relational database arranges data into different rows and columns by associating a specific key for each row. Almost all relational database systems use Structured Query Language (SQL) and are remarkably complex. They are traditionally more rigid or controlled systems and have a limited or restricted ability to translate complex data such as unstructured data. That said, SQL systems are still used extensively and are quite useful for maintaining accurate transactional records, legacy data sources, and numerous other use cases within organizations of all sizes.
In the mid-1990s, the internet gained extreme popularity, and relational databases simply could not keep up with the flow of information demanded by users, as well as the larger variety of data types that occurred from this evolution. This led to the development of non-relational databases, often referred to as NoSQL. NoSQL databases can translate strange data quickly and avoid the rigidity of SQL by replacing “organized” storage with more flexibility.
The Evolution of NoSQL
The acronym NoSQL was first used in 1998 by Carlo Strozzi while naming his lightweight, open-source “relational” database that did not use SQL. The name came up again in 2009 when Eric Evans and Johan Oskarsson used it to describe non-relational databases. Relational databases are often referred to as SQL systems. The term NoSQL can mean either “No SQL systems” or the more commonly accepted translation of “Not only SQL,” to emphasize the fact some systems might support SQL-like query languages.
NoSQL developed at least in the beginning as a response to web data, the need for processing unstructured data, and the need for faster processing. The NoSQL model uses a distributed database system, meaning a system with multiple computers. The non-relational system is quicker, uses an ad-hoc approach for organizing data, and processes large amounts of differing kinds of data. For general research, NoSQL databases are the better choice for large, unstructured data sets compared with relational databases due to their speed and flexibility.
Not only can NoSQL systems handle both structured and unstructured data, but they can also process unstructured Big Data quickly. This led to organizations such as Facebook, Twitter, LinkedIn, and Google adopting NoSQL systems. These organizations process tremendous amounts of unstructured data, coordinating it to find patterns and gain business insights. Big Data became an official term in 2005.
The CAP Theorem
The CAP Theorem, also known as Brewer’s theorem (after its developer, Eric Brewer), is an important part of non-relational databases. It states that a distributed data store “cannot” simultaneously offer more than “two of three” established guarantees. Brewer, at the University of California, presented the theory in the fall of 1998, and it was published in 1999 as the CAP Principle. The three guarantees that cannot be met simultaneously are:
- Consistency: The data within the database remains consistent, even after an operation has been executed. For instance, after updating a system, all clients will see the same data.
- Availability: The system is constantly on (always available), with no downtime.
- Partition Tolerance: Even if communication among the servers is no longer reliable, the system will continue to function. This is because the servers can be partitioned off, into multiple groups which can’t communicate with each other.
In 2002, a formal proof of Brewer’s concept was published by Nancy Lynch and Seth Gilbert of MIT, turning it into a “true theorem.”
ACID and BASE Provide Consistency
The two most popular consistency models use the acronyms ACID and BASE. Both models have advantages and disadvantages, with neither being a consistent perfect fit. The acronym ACID stands for Atomicity, Consistency, Isolation, and Durability. It was created in 1983 by Theo Härder and Andreas Reuter. The strength of ACID is the guarantee it will provide a safe environment for processing data. This means data is consistent and stable and may use multiple memory locations. Most NoSQL Graph Databases use ACID constraints to ensure data is safely and consistently stored.
The term BASE seems to have become popular in 2008 as an alternative to the ACID model. Availability for scaling purposes is an important feature for BASE data stores. However, it doesn’t offer the guarantee of consistency for replicated data during write time. The BASE model, generally speaking, provides less assurance than ACID. BASE is used primarily by aggregate stores, which includes column, document, and key value stores.
Non-Relational Data Storage Design
Non-relational data storage is often open source, non-relational, schema-less, horizontally scalable, and uses BASE for consistency. The term “elasticity” is used for data storage that is scalable, schema-free, and allows for rapid changes and rapid replication. Generally speaking, these features have been accomplished by designing NoSQL data storage from the bottom up and optimized for horizontal scaling. These systems often support only low-level, simplistic APIs (such as “get” and “put” operations). As a consequence, modeling with non-relational systems feels completely different from the modeling used in the relational world and follows a different philosophy.
NoSQL uses data stores optimized for specific purposes. Normally, NoSQL stores data in one of four categories:
- Key-Value storage
- Document storage
- Wide Column storage
- Graph database
A Key-Value Store, also called a Key-Value Database, is a data storage system designed for storage, retrieval, and managing “associative arrays.” A Key-Value Store works very differently than a relational database. A relational database pre-defines the data structure, using a series of tables containing fields with well-defined data types. This data store chooses from a variety of optimal options when classifying the data types.
A Document Store, also called a Document-Oriented Database, is a system designed for storage, retrieval, and managing “document-oriented information,” which is also referred to as semi-structured data. Document Stores have some similarities to Key-Value Stores, but differ in the way the data gets processed. A document-oriented system uses the internal structure of the document for identification and storage. Document Stores save all information for a given item as a single instance in a database (rather than spread out over tables, as with relational systems). This makes it easy to map items into the database.
A Wide Column Store uses tables, rows, and columns, but unlike relational databases, names and formats of the columns can change from row to row within the same table. They are more flexible. Wide Column Stores often support column families, which are stored separately. Each column family normally contains several columns used together. Within a specific column family, data is stored row-by-row, with columns for a specific row being stored together instead of each column being stored individually. Wide Column Stores supporting column families are also called column family databases.
A Graph Database is essentially a collection of relationships. Each memory (a node) symbolizes an entity (a business, person, or object). Each memory/node is connected to another. The connection is called an “edge” and represents a relationship between two nodes. Each node within a Graph Database includes a unique identifier, a set of incoming edges and/or outgoing edges, and characteristics that are represented as “key-value pairs.” Each edge also comes with a unique identifier, an ending and/or a starting place node, and a collection of properties.
Non-Relational Databases vs. Relational Databases
Relational and non-relational databases both have their pros and cons. Relational databases come with the limitation of each item containing only one attribute. Using a sales example, each feature of a client’s relationship is saved as a separate row of items within separate tables. The client’s master details use one table, while account details use another table. These tables are all linked by way of relations, such as foreign and primary keys.
Non-relational databases, on the other hand, are quite different, especially regarding key-value pairs, or Key-Value Stores. Key-value pairs allow several related items to be saved in one “row” within the same table.
It should be noted a non-relational “row” is not the same as a row in a relational table. For example, in a non-relational table, each row would have the client’s details, in addition to their account, sales, and payment history. All the data of one client can be saved together as one convenient record.
While the non-relational database has certain strengths when storing data, it also comes with a significant drawback – key-value stores cannot “enforce” the relationships between items. This means a client’s details (name, address, payment history, etc.) would all be saved as one data record. In a relational model, data would be stored in several tables, providing redundancy and enforcement. This means the relational model comes with a built-in, foolproof way of ensuring business logic and trustworthiness at the database layer. For example, the use of primary and foreign keys will show a payment in the appropriate client account. This is why relational databases continue to be popular.
However, the highest priority for web-based applications is the ability to service large numbers of user requests, which is the strength of non-relational databases. eBay, for example, allows users to browse and view posted items. Only a small number of these users will actually bid on or reserve an item, but millions, sometimes billions, of pages will be viewed per day. eBay is interested in a quick response time and wants to assure fast page loading, rather than enforcing strict business rules.
The Future of Non-Relational Databases
Non-relational databases have their own strengths and weakness, as do relational databases. As the NoSQL revolution continues, it is important to remember “the right tool for the right job” is a useful philosophy. Relational databases support accuracy and redundancy, while non-relational databases support research.
Currently, efforts are being made to merge the two database systems. Hybrid systems are adding SQL-type features like transactional support, joins, and customizable consistency. Additionally, SQL databases (for example SQL Server) are adding NoSQL features which allow more transparent tactics when using horizontal scaling.
It seems reasonable to predict non-relational and relational databases will continue to merge eclectically, adding strengths and minimizing weaknesses.
Photo Credit: Bakhtiar Zein/Shutterstock.com