Graph databases and key-value databases have very different features and are used for accomplishing different tasks. Key-value databases are streamlined and fast, but are limited and not as flexible. Graph databases, on the other hand, are very flexible and great for research, but not terribly fast. Both typically use a non-relational foundation.
The two key strengths of graph databases are their flexibility and their focus on relationships. Graph databases are especially useful because they highlight relationships and connections between relevant data. These databases generate insights using the existing data by placing a priority on the relationships within the information. Graph database models communicate how the data is related, but can also help in forming the research questions. Graph databases are good for organizing data using relationships, and responding to complex queries.
“Whenever customers ask me about graph databases, I keep it very simple. When you hear the word ‘graph,’ graph is equal to ‘relationship.’ So, any time you are trying to do analysis of relationships, that’s where you should use the graph database. And given that all of us are increasingly more connected to each other — both as people and as organizations, as entities — it just makes sense that graph databases would become more prominent and more important as time goes by.”
The key-value database came about as the simplest and, consequently, the fastest NoSQL architecture. It is basically a simple two-column hash table, with each row having a unique “key” (or ID) and a “value” that is associated with the key. The keys can connect with the appropriate single data values extremely quickly (much more quickly than a relational database).
Scaling is also a strength of key-value databases. Many of these database designs are open-source and free. Key-value database have the ability to read and write operations very quickly. For example, a computer using a key-value database would link a typed in name (Mike) with a file containing a stored a phone number and other pertinent information. They are normally very fast, but querying is very limited.
Casey Rosenthal, the General Manager of Professional Services at Bash, said:
“…[M]ost K-V databases don’t really have a query engine at all, since the lookup path can be traced as a straight line from the request to the object in memory or on disk somewhere. As a result, most K-V databases are much easier to scale than relational databases. This is particularly true of distributed databases that are designed to exist on multiple servers.”
A graph database is useful for research, while a key-value database is beneficial for day-to-day business activities.
A graph database is deliberately designed to show all of the relationships within the data. Rather than using tables, a graph uses nodes, edges, and properties when defining and storing data. A graph database is typically used for processing complex data when a query needs to go more than a couple of levels deep. Its queries can be extremely powerful, though its increased complexity causes it to run more slowly. This design makes modern graph databases an excellent choice for analyzing any relationships within the data.
The International Consortium of Investigative Journalists (ICIJ) used the graph database (Neo4j, combined with Linkurious for visuals) to research the Panama Papers and the Swiss Leaks. Mar Cabra, ICIJ’s Director of Data and Research, stated the Panama Papers were the largest leak in journalism history (2.6TB of data and 11.5 million documents). She went on to say:
“The way our journalists would deal with this before, was to print the paper and draw graphs on paper. We missed connections. So when we first went into business — for the Swiss Leaks — the first reaction of my reporters was ‘I didn’t know about this connection.’ This was very difficult to find in documents, because our brains are not wired like that, visually, and it requires a lot of work if you have to do this by hand. In the Panama Papers, that was even more interesting, because it allowed us to see patterns we couldn’t find before.”
Key-value databases provide easy access to records of the customer’s behavior and preferences, allowing the website or salesperson to customize the customer’s experience. A key-value database can store data from multi-channel marketing sources and portfolio management applications, and can provide the real-time information. Understanding key-value databases can be quite useful.
“Sessions” works well with key-value databases. This application starts a session whenever a customer logs in, and records activity until the customer logs out. During this time, the application stores all the session-related data, which may include the customer’s messages, profile information, recommendations, targeted promotions, and personalized data and themes. Each customer session comes with a unique identifier. Session data is queried only with the use of a primary key.
Website “shopping carts” also work well with key-value databases. During the holiday shopping season, an e-commerce website may receive billions of orders in seconds. Key-value databases can handle the scaling of large amounts of data and extremely high volumes of state changes while servicing millions of simultaneous users through distributed processing and storage. Key-value databases also have built-in redundancy, which can handle the loss of storage nodes.
Airbnb is a service advertising short-term room rentals, typically in people’s homes, and has handled over 80 million guest arrivals. They needed to monitor trends and raise alerts whenever there was increase in calls about a certain issue. They combined Redis (a key-value database) with Elasticsearch (a search engine built on the Lucene library) and Node.js (easily builds fast and scalable network apps) to compute trends and visually display the top trends.
Elasticsearch was used to store and query the data for data crunching, which considered the attributes of a ticket, such as browser version, user country, issue type, subject line, and more. All tickets are streamed, in real-time, into an Elasticsearch cluster. The time series from each of the attributes, from all tickets, is analyzed and ranked using an algorithm to find spikes and trends. Airbnb runs two different algorithms simultaneously to make improvements, while using a previous baseline for comparison. The Node.js app processes incoming tickets. Elasticsearch queries for ticket data stored on Redis (their key-value database). Redis, in turn, maintains records of ticket trend results. The Airbnb built their entire front-end with React, providing a rich UI for displaying the data.
One example of this system’s success deals with customer complaints. Airbnb noticed a spike in complaints by users who could not find their listing in a search. This is a common problem for new users when they start with the platform, so the customer service agents basically ignored the complaints. Additionally, because it was not a full failure, with no listings being returned, the engineering team also failed to notice. However, because they saw a ticket spike, a tech realized the issue existed and it needed to be fixed.
The Multi-Model Database
A multi-model database combines a graph database with a key-value database, and other types of databases (such as reactive, object-oriented, and geospatial models). With a multi-model approach, high-performance applications can be scaled horizontally. By comparison with the “layered approach,” multi-model solutions provide flexibility and performance advantages, but its key-value operations don’t move as quickly as a pure key-value database.
This kind of flexibility is useful when dealing with large, complex situations, such as a fleet of a semi-trucks (or a chain of stores, etc.). The multi-model database is useful for managing significant amounts of hierarchical data (information on the big rigs, their geographic locations as they travel, maintenance histories, and several million repair parts). This data is tracked to provide research information and help to answer questions, such as:
- Which trucks are down for maintenance?
- Why was so much fuel used on one trip?
- Who has the part we need to make this repair?
- Which parts of this 18 wheeler will need maintenance next week?
ArangoDB is one example of a multi-model system. It is open-sourced and described as a “multi-model database” that combines the graph database with key-value and document data models, and reasonably fast search abilities. It uses AQL (a native query language similar to SQL), and comes with full-text search capabilities and a ranking engine.
Big Consultancy uses OrientDB graph databases to improve and personalize the customer experience, adapting advertising to each individual visitor by highlighting the individual’s interests and suppressing advertising that is irrelevant. OrientDB supports graph, key-value, document, and object models, as well as ACID transactions and SQL queries.
Image used under license from Shutterstock.com