Graph databases became recognized as a database design in 2006, when Tim Bernes-Lee developed the concept of a huge database called the “linked data.” This concept became the basis of graph storage, and could display how organizations, people, and items or entities are associated, or “interconnected” with one another, and the nature of the relationships. Graph databases store that data, and its connections, easily translating network data into actionable insights. Additionally, graph databases are typically based on NoSQL, and expand or scale quite easily. Because of their design, graph databases provide an excellent analysis of interconnections, which explains their recent increase in use for mining data.
Relational (or SQL) databases, which display several rectangular grids of information, often look very much like standard spreadsheets. Each grid shows different numbers of rows and columns, holding different types of information. (Relational databases can use an arrow system, but this becomes overwhelming and confusing fairly quickly.) Non-relational graph databases, on the other hand, will typically display named bubbles (representing an organization, person, or object) with simple arrows symbolizing the connection (in many cases, there is a word above the arrow describing the relationship). Relational databases have maintained their popularity over the years because they are inexpensive, accurate, and consistent. However, the process of establishing relationships (or joins) in a relational database can be time-consuming and expensive.
A general overview of graph databases can be found here. Prior to 2014, graph databases were generally seen as slower, more difficult to work with, and much more limited than relational databases. Additionally, they were considered to be “academic” databases, designed to build logical analysis systems, and not necessarily useful for business purposes. Though graph databases could provide useful results, in general they were complicated, time-consuming, and not terribly user-friendly.
In 2014, a number of technological innovations supported the evolution of graph databases. Neo4j (an early open-sourced graph database, and still quite popular) began gaining popularity for certain types of mathematical graph processing. At the same time, hardware (by way of cloud computing) had gained enough speed to resolve many of the early performance challenges. In 2013, a graph query language (called SPARQL) came out with an edition that resolved many of graph databases’ earlier problems. Furthermore, the development of JSON data stores (CouchDB and MongoDB) led to a significant improvement in the handling of joins (a core requirement for databases, but an especially important one for graph databases).
Also, around 2014, a number of businesses started experimenting with graph databases as a way to resolve issues that were becoming irritations at the corporate level (Metadata Management, Master Data Management, knowledge navigation, etc.). More recently, machine learning algorithms have been incorporated into building graph databases.
In spite of their strange appearance, graph databases are currently more flexible than more traditional relationship databases, because the relationships between items is shown with simple arrows, or “edges.” The arrows can show friendships, business relationships, and more. The bubbles can show who likes what, or the goals of the business.
Accomplishing the same thing with a relational model would require creating time-consuming and expensive joins. Additionally, the schema of a SQL database would have to be expanded to include the additional fields. Though this can be done easily with the majority of scalable graph databases, SQL formats lack their scalability.
Algorithms for Graph Databases
A graph database uses algorithms to make sorting through all of the data easier. An excellent example of this can be shown with the notorious “Panama Papers scandal,” with research and discovery that covered thousands of shell companies. These “shells” allowed movie stars, criminals, and even the former prime minister of Iceland, Sigmundur David Gunnlaugsson, to hide money in offshore bank accounts. The use of graph databases and their algorithms made research into these shell companies possible.
Two very popular traversal algorithms are the depth-first search (DFS) and the breadth-first search (BFS). The depth-first algorithm will travel from a starting node to an end node, and then repeat the search, starting from the same node, but taking a different path, until the query is answered. Breadth-first search algorithms search by exploring graphs one layer at a time. They start searching nodes one level deeper than the start node, then move on to explore nodes in the second layer, and then to depth three, etc., until the whole graph has been examined. BFS will find the shortest path, while DFS travels to the base of a subtree, and then backtracks.
As a general rule, depth-first searches are a good idea when seeking discrete pieces of information. An uninformed search is the most basic level of depth-first. This type of algorithmic process will search a path to its end, and then backtrack to the start node, where it tries a different path. Informed searches, on the other hand, attempt to minimize the amount of searching by using algorithms that don’t backtrack, or use a screening process in selecting the paths and nodes for the search. As a result, an informed search will take place more quickly than an uninformed search. (Graph traversals generally perform informed searches.)
Artificial Intelligence, Machine Learning, and Graph Databases
In terms of training, graphs can provide context for machine learning (ML) and artificial intelligence (AI). Graph technology can connect data and define relationships. The process of enhancing AI by using graph technology provides an effective method of training sophisticated AI and ML applications.
Additionally, graphs support greater transparency in the way AI makes decisions. This process is called AI explainability. These advantages have led to a growing preference in using graph databases for training AI and ML applications.
Jim Webber, the chief scientist for Neo4j, had this to say on the topic: “Machine learning algorithms help data scientists discover meaning in data sets, and these insights can be expressed as relationships between nodes in a graph. Graph databases enable efficient storage and traversal of information about relationships. Therefore, graph data can either be the input or the output of machine learning processing.”
Popular Use Cases
To remain competitive, businesses must stop the practice of merely collecting data points. They must start “connecting” the relationships that exist between the data points. Unfortunately, the popular relational database management systems (RDBMS) handle relationships between data points poorly. The tabular SQL data models use rigid schemas, making it problematic to add new connections. To leverage those data relationships, a graph database is needed.
Graph databases will effectively store the relationships that exist between data points, and they are flexible enough to add new kinds of relationships and adapt a data model to allow new business requirements. Many converts see graph databases as the future of online businesses. These are the primary use cases for modern graph database technologies:
- Customer 360: This term is generally associated with SalesForce, but it can be used as a graph database for other purposes as well. It provides a broad view of customers, based on the people (nodes) and their relationships (edges).
- Asset Management: This type of system is very useful because graph databases scale so easily. Unlike a relational database management system, it does not come with a rigid structure, allowing nodes, edges, and other properties to be added at will.
- Real-Time Recommendation Engines: When interacting with Amazon or Netflix, a person’s viewing and purchasing history will be searched, and people designated “similar to you” will be scanned. Based on their purchases and interests, movies and other items will be recommended for you to view or purchase next. For example, if an iPhone were purchased, an iPhone cover might be suggested.
- Master Data Management and Identity Management: This is often described as a way of managing “data silos.” It keeps track of data, communicating who has it, and which database has stored it, using metadata. In this situation, the actual data is stored in an SQL database or an LDAP database – not in the graph database.
- Fraud Detection: Fraud detection, generally speaking, seems to work fairly well. A notice of suspicious activity is sent about an account, and the account’s owner (for example, with a credit card) can decline the charge or activity. This is accomplished by using different algorithms that are designed to crawl along a graph model’s nodes and edges seeking suspicious activity.
Fraud analysis can also be applied to cybersecurity intrusions as well. This has the added benefit of only alerting analysts of events that are truly a significant concern, while not wasting time and resources on statistical flukes. (It uses intelligence, rather than hard-coded thresholds.)
The Future of Graph Databases
Graph databases are remarkably efficient at communicating relationships between people or objects. Now that they have evolved to the point of being user-friendly, the full strength of their abilities to predict patterns is being explored. Until this technology is replaced by something better, the strengths of graph databases will continue to be explored and advanced.
Image used under license from Shutterstock.com