The NoSQL Movement – Graph Databases

By on

by Paul Williams

This installment in DATAVERSITY’s NoSQL series covers Graph databases. Graph data stores are the NoSQL type suitable for finding relationships within massive amounts of data at the fastest possible speed. They see wide use in social networking applications as well as in high-end analytics.

In Graph database architecture, the objects known as Nodes and Edges serve the roles of Entities and Relationships from standard SQL architecture. Nodes also contain properties which describe the actual data contained within each object. A diagram of a Graph database looks similar to the object diagrams used in object-oriented programming.

Due to the similarity between graph and object diagrams, Graph databases interface nicely when mapped to an object structure within an application, facilitating or even mitigating the use of an ORM framework.

The biggest advantage of Graph databases is obviously their speed for certain types of transactions; in particular, those involving relationships, since processing-intensive joins are not required. The fact that the design of Graph databases is less dependent on complex schema than their relational counterparts also lends itself to easier modifications and migrations to in-production systems.

There are many popular commercial and open source Graph data stores to suit the needs of customers looking to explore this part of the NoSQL world.

InfiniteGraph is a Top Commercial Graph Database

InfiniteGraph is a distributed Graph database system developed by the Californiashop, Objectivity. Version 2.1 of the software saw coverage in DATAVERSITY earlier this year. Like most Graph databases, InfiniteGraph is perfect for finding relationships within massive datasets, making it suitable for Big Data analytics applications. The product also sports a high scalability factor.

Written in Java and C++, InfiniteGraph requires a Java compiler as well as some programming chops to get started. Since it leverages a “pay as you go” license model, the initial download and subsequent development and deployment work are all free.

The tool supports the Gremlin Blueprints standard popular in the Graph database community. This gives InfiniteGraph users the capability to share testing, analysis, and connectivity logic amongst other database developers. An included plug-in framework also supports modularity and code re-use.

InfiniteGraph also features a visualizer providing easy viewing of database models and facilitates the browsing of actual data and metadata. The visualizer supports InfiniteGraph’s plug-in framework and allows the exporting of models into GraphML and JSON formats.

In addition to InfiniteGraph’s flexible pricing model, Objectivity also offers a host of professional services around the design, development, and deployment of the Graph database system. These include database design reviews, performance tuning, pilot projects, system testing, and others. The company also recently started a professional certification program for InfiniteGraph users.

Neo4j is a Prominent Graph Database

Written in Java, Neo4j is an open source graph database with a variety of available license options suitable for prototyping as well as commercial deployment. Neo Technology, developers of Neo4j, provides support for the paid Neo4j licenses.

Neo4j sports massive scalability. The database handles billions of nodes on only one server, and can easily scale across multiple machines. It also features a disk-based persistence model written in native code.

The database allows for deployment flexibility – either as a full server, or a slim version contained within a 750K jar file. Neo4j also fully supports ACID transactions with the added nicety of XA-compliant distributed two phase commits.

While Java works best as a client language for Neo4j, the database’s API features support for Ruby, Python, Groovy, Gremlin, and others. Neo4j also plays nicely with the Spring middleware framework used in enterprise Java installations, including the capability for the use of POJO objects.

AllegroGraph Combines Memory Cache and Disk-based Storage

AllegroGraph is commercial Graph database developed and sold by Franz, Inc. The product combines memory caching with persistent disk storage to improve overall performance. Triples in the Resource Description Framework (RDF) format are used by AllegroGraph as a format for data persistence.

ACID is fully supported by AllegroGraph, including the commit, rollback, and “checkpointing” of transactions. Dynamic auto-indexing is also included, with each commit triple stored in seven different indexes. AllegroGraph achieves 100 percent read concurrency and is close to 100 percent for writes.

A REST API exists for many popular client languages, including Java, C#, Python, Clojure, and Ruby. AllegroGraph’s server also uses JavaScript for control language purposes. A SaaS version of the database is hosted on Amazon’s EC2 service.

Franz provides a collection of AllegroGraph-related tools and utilities, most notably their Gruff product, which allows the browsing and managing of RDF graphs. Gruff also supports the creation and editing of RDF queries written in the SPARQL or Prolog languages. It also sports a visual graph view of the nodes and links within the database.

OpenLink Virtuoso

Users looking for an open source version of a RDF Graph database need to check out OpenLink Virtuoso. A commercial version of the database exists under the name, Virtuoso Universal Server. Both products were developed by OpenLink Software.

Virtuoso seems like the Swiss Army Knife of database products. It handles a whole host of database formats, including the previously mentioned RDF, as well as relational and XML. It also functions as a document server and provides object to relational mapping functionality.

The only major functional differences between the open source and commercial versions of Virtuoso is the former’s lack of support for the Virtual Database Engine and Data Replication Functionality. Virtuoso’s SQL functionality is comparable to SQL Server or Oracle DB.

Virtuoso also serves as a web application server, providing a REST programming interface for the standard array of languages, including Java as well as those supporting the .NET Framework. In fact, it works with any language that can be hosted within a C/C++ runtime.

Virtuoso’s scope of functionality almost transcends an article on Graph databases. This is a system with ability to provide value in a large number of areas of database management.

HyperGraphDB for Knowledge Management and Embedded Systems

HyperGraphDB is an open source versatile database system suitable for a wide range of applications. It uses a persistent memory model appropriate for web-based knowledge management, semantics, or analytics applications. The database also serves as a Graph database and can function in an object-oriented fashion downsized for Java-based embedded systems.

Given its Graph storage architecture and overall flexibility, HyperGraphDB offers the capability to perform both edge transversal queries as well as those following a relational style. It also includes a P2P framework for a distributed data flow.

HyperGraphDB’s ability to function as a JSON-persisted Graph database in a Java environment led it to be used as the backend layer for the eVallaha project, essentially an anonymous blogging site covering failed technology projects at government agencies. The author of the site feels HGDB is a great option for web apps leveraging a REST API for the data layer.

Looking at Distributed Graph Processing

Many newer Graph databases are architected specifically for distributed graph processing, while some products have that functionality already supported as a feature, including the previously mentioned InfiniteGraph and Virtuoso Universal Server.

Apache Hama features a Bulk Synchronous Parallel processing framework built on top of the Hadoop Distributed File System (HDFS) creating an environment well-suited for massive graph processing. The open sourceHama uses a message passing paradigm similar to what is found in object-oriented programming that provides advantages over MapReduce.

Another open source option for Graph database tinkerers is FlockDB created by a technical team at Twitter. That team wanted a distributed, fault-tolerant database that was simple to use. The FlockDB source code is available for download at GitHub.

Google developed its own internal Graph processing system, called Pregel. Suitable for high-scalability and large amounts of Big Data, further details on Pregel are available in a white paper Google published for the ACM.

Big Blue Adds Graph Support to DB2

IBM’s DB2 holds a venerable spot in the history of databases. In the nearly 30 years since its original release, Big Blue continues to make enhancements to the product that reflect the latest advances in database technology.

The recent version 10 of DB2 saw IBM add support for Graph databases persisted in the RDF format. In addition to its full range of enterprise features, including high scalability and an API that supports practically every programming language known to humanity, DB2 remains a viable option for any organization’s database needs.

The final article in the NoSQL series moves from Big Blue to Big Table as DATAVERSITY trains its eye on the world of tabular databases, including Google’s BigTable.


Other articles in the series:

Leave a Reply