Graph Databases: An Overview

The concept of graph databases traces back to Leonhard Euler. Euler was an 18^th century Swiss mathematician who made several important discoveries in mathematics, such as infinitesimal calculus. In solving the “Seven Bridges of Königsberg” problem in 1736, Euler laid the foundations for graph theory. (He also got a fun shout-out in Hidden Figures when Euler’s Method was discussed.)

When solving the “Königsberg” problem, Euler ignored the choice of route, claiming the route within each land mass was irrelevant. Instead, his focus was on the sequence of bridges being crossed. With this shift in focus, he reset the problem in more abstract terms and laid the foundations for graph theory. He replaced the land masses with the abstract vertexes (called nodes in graph databases). Each bridge became an abstract connection, or “edge,” with the edge representing the relationship between the two vertexes (or land masses) connected by that bridge. His resulting mathematical structure was called a graph.

In the modern world, a graph database is essentially a collection of nodes and edges. Each node contains descriptions of entities or objects – such as a person, business, or automobile – and each edge represents descriptions of the relationships between the nodes.

For instance, an internet business must use GDPR policies when dealing with their customer base. Having a policy when dealing with customers can be represented using the internet business as a node, the customer as a node, and the GDPR as the connecting edge, or their relationship. Relationships between nodes can be expressed as a unidirectional relationship. “Internet business-GDPR policy-customer” can be expressed on a graph with the edge connecting the internet business node to the customer node.

The Uses of Graph Databases

The flexibility of graph databases and their focus on relationships are the two key factors explaining the recent surge of popularity in graph databases. The need to generate insights from the existing data supports a database technology that places a high priority on relationship information.

Oddly, traditional relational database management systems (RDBMS) handle data relationships very poorly. The rigid schemas they use make it difficult to change connections or add new business requirements. Graph databases, on the other hand, store data relationships efficiently, but are also flexible. They allow for the easy expansion of data models and adjust readily to changing business needs. As a consequence, graph databases are often used for:

Graph databases have become a popular tool for mining data from social media sources. They can also be useful when working with data that involves complex relationships, such as identifying and creating recommendations along the lines of “customers who bought this also looked at…”

When the goal is to gain insights about business relationships, graph databases are a good choice. Graph databases can provide insights into customer interests and help create messages for certain clients. These systems can help businesses create accurate, well-rounded customer profiles. Graph databases are especially useful when an application’s data model needs to support:

When Not to Use a Graph Database

There are situations where graph databases are not a good fit. Transactional data, where relationships have no importance, are one example. Simplistic lists (data that is fixed and tabular), such as names and associated phone numbers, do not need to be stored on a graph data base – a relational database would be better. Complicated commands, such as “List all clients with incomes over $200K between the ages of 20 and 40,” should not be performed on a graph database, because they cannot respond to “multi-faceted” commands.

Graph databases are not designed for optimal performance when scanning bulk data or using unknown start points. If queries are scanning tables for a match or seeking data fitting a general category, graph solutions are not the best-suited for the task. Graph databases are designed and built for finding relationships through the use of a starting data point. They are not designed for searching an entire graph without a specific starting point.

BLOBs and CLOBs do not work well with graph databases. While graph databases are good at maintaining relationships between different small data entities, they are not designed to store several properties on one node. While a query can move from entity to entity very quickly, it takes time to pull out the details from each entity.

Graph Storage

Graph data processing systems can use a variety of storage mechanisms. When storage is designed specifically for graph-like data, it is called a native graph database. A graph database using native graph storage is optimized for graphs during every step of the process. This assures the data is efficiently stored with connecting nodes and relationships.

Non-native graph databases store data using other systems. For example, some use a relational engine, storing the graphed data within a table. Others use wide column storage – a type of NoSQL database. Because of their design, these databases save information about relationships in a location far from their associated node. This non-native approach leads to slow processing, because the storage layer isn’t designed for graph associations. Other systems use key-value storage, or document-oriented databases, putting them in the category of NoSQL systems.

Different Types of Database Graphs

Generally speaking, graph solutions offer the most efficient method of analyzing data that is highly-connected when seeking possible (hidden or obvious) connections. Graphs can offer a more natural perspective on some data. Graphs are often used as an easy way to find trends within data because data is presented symbolically rather than being displayed numerically as a table. Graphs allow complicated data to be displayed and interpreted much more easily than a data table.

Different kinds of graphs include:

Social graph: Focused on the connections between people. This type of graph is intuitive and widely used. The well-known concept of “six degrees of separation” can be mapped with a social graph. Twitter and Facebook use social graphs.
Intent graph: Expresses reasoning and motivation, expressing intent.
Consumption graph: This graph is used heavily in the retail industry and tracks the consumption of each individual customer. Also called the payment graph.
Interest graph: Maps an individual’s interests and is often used with a social graph.
Mobile graph: Uses mobile data to create various graphs and charts.
Property graph: A directed multigraph. An object can be attached to each node and edge on a Property Graph. It can use multiple parallel edges which share the same source and destination node. The use of parallel edges can help in expressing multiple relationships.
Knowledge graphs: Google is well-known for its Knowledge Graph and uses it to enhance their search engine’s results, using information taken from a variety of resources. The information is shown to users with an info box alongside the search results. Knowledge Graph information is often used when Google Assistant and Google Home answer spoken questions. TigerGraph and Neo4j also offer knowledge graphs.

There are a wide variety data graphs available, with new ones being developed as needed. Graphs and charts have helped to identify unknown trends and make informed decisions. Modern technology has promoted an explosion of new ways to visualize and present patterns and trends. The variety of ways graphs can be used to express useful information are limited only by imagination. Gaurav Deshpande, from TigerGraph, said in a DATAVERSITY® interview:

“Whenever customers ask me about it, I keep it very simple. When you hear the word ‘graph,’ graph is equal to relationship. So any time you are trying to do analysis of relationships, that’s where you should use the graph database. And given that all of us are increasingly more connected to each other, both as people and as organizations, as entities. It just makes sense that graph databases would become more prominent and more important as time goes by.”

Graph Databases and AI

Because of their design, knowledge graphs capture and store information related to people, things, processes, applications, and data, and the relationships connecting them. They also provide evidence supporting the strengths of the relationships. These relationships provide context, which can be very helpful in training artificial intelligence.

Knowledge graphs differ from data warehouses and data lakes in terms of operational convenience. A data warehouse is useful for static business insight projects, but knowledge graphs can provide powerful insights in real time, such as real-time recommendations, knowledge sharing, and fraud detection. These characteristics make graph databases and knowledge graphs ideal tools for Deep Learning techniques when training Artificial Intelligence. In his article, Looking Forward to 2019 in Graph Technologies, Dan McCreary writes:

“I try to tell everyone around me that there is no clear binary division between graph-based rules engines and inference rules generated by deep-learning algorithms. Deep learning rules are just larger and harder to explain. In order to have explainable AI we need to bring both graph-rules engines together with machine-learning systems. Vendors that do this well with have a distinct advantage.”

Image used under license from Shutterstock.com

LISTEN NOW: MY CAREER IN DATA PODCAST

Data Topics

Leave a Reply Cancel reply