The NoSQL Movement: Key-Value Databases

By on

by Paul Williams

This installment of the DATAVERSITY series on the wide-ranging NoSQL movement covers Key-Value databases. The Key-Value pattern remains one of the most common occurrences in computer engineering and its use is vital in the world of non-relational databases.

Accessing memory locations in assembly-level programming is essentially following the Key-Value pattern, with the memory location’s address serving as the key, and the value is stored at that memory address. The computer science concept of the hash table serves as another example, with a hash function transforming the key into an index used to find its associated value.

The reason this pattern gets used again and again in programming and computer engineering is because of one word: speed. It can be argued that computers at their lowest (and fastest) level depend on the Key-Value pattern, so it makes perfect sense the Key-Value database is much more suitable for the requirements of massive Big Data when compared to the relational model. In short, Key-Value data stores are much faster.

The ACID (atomic, consistent, isolated, durable) principle normally present in relational databases is not supported by the massive throughput typical of many systems using a Key-Value database. Because of this, many Key-Value systems use an “eventually consistent” model, widely used in parallel processing and distributed systems.

Facebook Likes Apache Cassandra

Using eventual consistency, Apache Cassandra saw its genesis as the internally developed back-end for the ubiquitous Facebook social network.

Cassandra is designed for distributed environments, and it is well suited for replication across multiple Cloud-based data centers. Low latency ensures end-users gain access to their data with the high velocity needed in today’s web-centered environment. The decentralized, distributed nature of Cassandra enhances the database’s fault tolerance.

While based on the Key-Value model, Cassandra also supports the concept of columns and super-columns – essentially nested Key-Value pairs – that facilitate the modeling of more complex data structures. They also allow the reading and updating of a column without retrieving the entire record.

In addition to Facebook, Cassandra is also in use at Netflix, Twitter, Constant Contact, and eBay among many other companies in many different business sectors. Typical of open source software, there is an active support community centered on the database and many professional third-party consulting options are available.

Key-Value as a Service

Given the rise in the Data as a Service concept, it makes sense that many of the companies in this growing sector leverage Key-Value databases on the backend. Amazon’s DynamoDB is one example, with a pricing plan that charges by the throughput using a data model based on collections of Key-Value pairs.

Another relevant player in the “Key-Value as a Service” space is Cloudant. They have garnered acclaim in the gaming industry. Hothead Games uses Cloudant for their “Big Win” series of sports games. Cloudant’s technical stack features the Apache CouchDB database, sporting quick updating and retrieval of semi-structured data stored as marked-up JSON containing one or more Key-Value pairs.

Google’s App Engine is a Platform as a Service product. For data storage requirements, App Engine features Datastore, a NoSQL database which combines features of Key-Value, Multi-Value, and tabular databases.  The latter similarity exists because Datastore is built on top of Big Table, Google’s signature tabular database product.

In-memory Cache

Considering that the basic structure of computer memory nicely fits within the Key-Value pattern, in-memory cache databases using said pattern continue to grow in popularity. One of the industry’s top memory cached database systems is the obviously named, memcached. It was originally developed in 2003 by a developer working for the seminal social network, LiveJournal.

Using a hash table combined with Key-Value storage, memcached intelligently distributes memory to areas of the database with the highest need. Raw data as well as serialized objects can be stored within the cache. Some implementations use web server RAM, but companies with larger databases occasionally dedicate an entire server’s RAM to the memcached instance.

In addition to LiveJournal, memcached is used by Twitter, Wikipedia, Flicker, Craigslist, and Digg. The software is freely available through the Berkeley Software Distribution (BSD) license.

On the proprietary side of the in-memory cache landscape is Oracle’s Coherence. At its core is a cached data grid optimized for application servers. The product also supports a highly-scalable replication protocol suitable for large distributed systems.

Coherence features a distributed processing model that smartly allocates system resources to areas that need them, similar to memcached. The tool also sports a robust API, with versions available for Java, C++, as well as Microsoft’s .NET language family. Coherence was originally developed by the smaller database company, Tangosol, which ended up acquired by Oracle in 2007.

Other in-memory data caches using the Key-Value pattern include: NCache developed by Alachisoft, which promises a linear scalability factor compared to relational databases. NCache works natively with .NET. It contains ASP.NET session state caching functionality, and the product also includes a fully-featured API for Java.

Microsoft’s AppFabric used to be a separate product, and is now embedded as part ofRedmond’s Azure cloud computing offering. It provides in-memory caching functionality to help with the scalability of ASP.NET applications.

Redis sports a unique take on the standard Key-Value pattern. This open source in-memory cache allows lists, sets, and sorted sets to be stored as keys, as supposed to the usual string-only key storage. This functionality gives the product the term “data structure server.”

Finally, Hazelcast remains a popular open source in-memory data grid with dynamic scaling and WAN replication functionality. The company offers a free Community Edition as well as a commercially licensed Enterprise Edition that features a native C# client and data management software.

Key-Value Databases going Solid State

One trend closely related to in-memory cache databases is the movement to solid state memory for persistent storage. Considering that this kind of storage contains no mechanical parts prone to eventual failure, like the hard disk, solid state memory continues to grow in usage. Solid state storage also tends to be faster than traditional hard drives.

This movement to solid state persistence parallels other trends in the database industry driving the growing use of NoSQL – namely the desire for increased performance as well as faster access to data through better distribution. Because of this, modern Key-Value data stores tend to be the ones offering solid state storage options as well as RAID disk arrays.

Couchbase, created by the merger of the folks behind membase and CouchDB is a NoSQL solution suited for both kinds of storage. Couchbase essentially serves as a wrapper for the Document-based Apache CouchDB, providing it with clustering and state of the art storage options, as well as the Key-Value caching functionality derived from both membase and memcached.

The persistently stored younger brother of memcached, MemcacheDB offers similar Key-Value pattern database functionality, being primarily derived from the codebase of its older sibling. MemcacheDB uses both solid state and rotating disk storage options, giving their customers flexibility. The database also leverages transaction and replication functionality from Berkeley DB.

Berkeley DB for Embedded Key-Value Persistence

Speaking of Berkeley DB, this embedded Key-Value store is considered by many to be the most widely used database in the world. First developed at theUniversity ofCalifornia, the product was then managed and enhanced by Sleepycat Software before that company was acquired by Oracle, who continues to support the database.

Unlike the Big Data dominant world typical of most NoSQL databases, Berkeley DB is optimized for the smaller data stores usually found on embedded and mobile devices. Its runtime memory requirements are only around a few kilobytes. Yes, that says kilobytes. The product also scales to terabyte-sized databases distributed across multiple parallel servers.

Berkeley DB now offers SQL, XML, and Java object storage options in addition to its traditional Key-Value data store. The SQL functionality is essentially similar to SQLite. A Java version of the product exists for users not comfortable developing in C.

Even with Oracle’s ownership of Berkeley DB, the tool is still available under the open source Sleepycat Public License.  A dual license option is also available from Oracle for companies wanting something different than the open source model.

Key-Value databases make up a large part of the growing world of NoSQL. That is not surprising, considering the ubiquitous nature of the Key-Value pattern throughout computer engineering. When dealing with Big Data, fast is good – and the Key-Value pattern is very fast.

From Berkeley DB all the way to Apache Cassandra, Key-Value data stores also remain a vital segment of the open source movement. They continue to drive the innovations in database technology on a daily basis.


Other articles in this series:

Leave a Reply