The recent upward tick in the popularity of graph databases within the NoSQL movement reflects the trend of Big Data, partly generated by social data. Enterprises are turning to graph databases to figure out customer purchasing patterns, voting history, as well as that proverbial “needle in the haystack” bit of valuable information. The latter use-case is especially important to those working with law enforcement or homeland security applications.
One of prominent examples of a graph database is Neo4j. Developed by Neo Technology, a company dually located in San Francisco and Sweden, the open source Neo4j is available in a variety of licenses. The free of charge Community version is an easy download and install, making it straightforward for exploring the database and its web-based admin interfaces. Additional modules suitable for more esoteric or production level functionality are part of the Advanced version, with an Enterprise edition also an option for larger corporations.
A Robust Graph Database
At its core, Neo4j features a fast and agile graph database where the data gets stored in nodes, with each containing a number of properties. The relationships between nodes are what matters in a graph database – a model which ties in nicely with the applications found in the world of social networking. Both the nodes and the relationships can be indexed, which enhances the database’s overall performance.
Even considering its overall power, Neo4j remains flexible; the database is suitable in distributed environments, and its smaller footprint means the graph database technology can be embedded inside another application written in a language supporting the JVM, including Scala, Clojure, and of course, Java. The database supports a REST interface for client applications, in addition to providing a Java API for more robust client development. Neo4j itself was primarily developed in Java.
Neo4j is cross-platform, with support for the Windows, Mac OS, and Linux platforms. Typical of open source software, there are usually a variety of current, legacy, and still-in-development versions of Neo4j available for download from their website. With an array of older editions of the software, the company provides documentation to assist with the upgrade and migration process.
Neo4j is an Easy Installation
Installing the Community Edition of Neo4j was a breeze; there are also pretty detailed instructions and a video to follow on their website. On Windows, the ZIP package basically installs all the necessary files in a directory structure. The user simply then navigates to the bin directory and runs a batch file to start the database server.
Neo4j’s web administration screen is easily accessible after installation.
Once the Neo4j server is running, the web administration screen is easily accessible by typing in the local server address in a web browser. A few minutes is all it took to get up and running. The administration interface provides easy navigation to a data browser, a query language console, a screen for index maintenance, etc. Links to documentation and the Neo4j community site are definitely useful as well.
The community site is a great place to find sample databases and other information that helps with the Neo4j learning process. The database has engendered a vibrant user family providing help and fleshing out different aspects of the software. The web page about using Neo4j with Ruby on Rails is a fine example of this kind of extra content generated by the Neo4j user community.
Interacting With a Neo4j Graph Database Using Java or Cypher
As mentioned earlier, Neo4j provides full support for interacting with a graph database using any language compatible with the Java Virtual Machine (JVM). There is also the native Cypher query language which provides a framework independent way to access the database.
Cypher is a powerful and descriptive query language that should be easy to learn by anyone familiar with SQL. There is even a live web-based console for users to try out different Cypher queries. The console features a visual graph that reacts to the queries, helping to cement the concepts of graph databases for the novice user.
Neo4j’s web console is great for experimenting with Cypher.
Cypher uses the START, RETURN, and MATCH statements to serve more or less as the graph database equivalents of SQL’s FROM, SELECT and WHERE. START is used to stipulate which node in the graph database to begin the query, while RETURN lists the properties, or fields, returned by the query. The following finds Paul’s friends in the graph, starting in the people node, who are older than 18:
[MATCH me-[:FRIEND]->friend ]
WHERE friend.age > 18
RETURN me, friend.name
ORDER BY friend.age asc
SKIP 5 LIMIT 10
The ORDER BY, SKIP, and LIMIT statements should be self explanatory to anyone familiar with SQL. Cypher uses the CREATE and DELETE statements to add and remove nodes and relationships. The language also provides a host of aggregate and other types of functions. A measure of transaction support is also available at the console, using the BEGIN, COMMIT and ROLLBACK statements.
While Cypher provides a robust collection of console-level functionality, Neo4j also shines when controlled using Java or any other language that supports the JVM. In fact, the “4j” in the name more or less means, “for Java.” Remember, the Neo4j server also supports the REST interface, which provides an easy way to interact with the database; additionally, server plug-ins written in Java can be used to extend the basic REST functionality.
The native API offers another path to client development integrating graph database functionality. Neo4j also works when embedded into a JVM process; its small footprint makes embedding possible. All the nodes, relationships, and paths in a graph database are accessible as programming objects, in addition to the ability to run Cypher queries in code, allowing the development of a wide variety of applications.
There are many examples of projects using Neo4j technology in interesting new ways, including a Neo4j JDBC driver and the Spring Data Neo4j project which leverages the Spring Framework to offer object-graph mapping in concert with Neo4j. Once again, this speaks to the power of an engaged software developer user community.
Options for Neo4j Licensing and Support
Neo4j includes a wide array of licensing options depending on the installed version. Obviously, the Community edition is all that is needed to learn about graph databases as well as playing around with client development either using the REST interface or the Java API. There is enough documentation and help available to get anyone successfully up and running.
The other editions of Neo4j add the additional functionality that makes the product suitable for enterprise production instances. The Advanced edition enhances the monitoring provided by the database. The Enterprise edition adds online backups and clustering support to the Advanced edition features. Neo4j is available under a dual license that combines AGPL with a commercial license from Neo Technology; the commercial license allows enterprises to include Neo4j in a closed source system.
Neo Technology also provides support for Neo4j for owners of the Advanced and Enterprise edition. The Enterprise edition offers 24/7 phone support, as supposed to the email only provided by the Advanced edition. The list of organizations currently using Neo4j is impressive, including Cisco, Adobe, Accenture, and Lufthansa, among many other startups and enterprises.
Graph databases are growing in importance and popularity, driven by the exponential expansion of social data. Neo4j is helping to lead this revolution, all the while staying close to the innovative nature of their open source roots. Anyone interested in learning more about graph databases would do well to download the Community edition and explore Neo4j.