Scalable Graph Database Technology: Combining Big Data and Real-Time Analytics

In February of 2018, TigerGraph released the 2.0 version of its Graph Database platform, described as “the next evolutionary step for Graph Databases.” TigerGraph was founded by its CEO, Dr. Yu Xu, in 2012. He is an expert in parallel database systems and Big Data. In 2011, while working Twitter, he said he discovered Graph Databases were seriously inadequate. As a result, he brought together 30 engineers and developed TigerGraph and their native parallel Graph Database system.

TigerGraph is a scalable and fast Graph Database platform. It uses Native Parallel Graph Technology to create a distributed, parallel Graph Computing platform, which supports running real-time, web-scale data analytics. By eclectically combining key technologies (Massively Parallel Processing, fast data compression/decompression, and MapReduce style distributed computing), TigerGraph offers speed, scalability, and deep querying capabilities.

The company claims graph traversal and query response times are 4 to 100 times faster than previous Graph Database technologies. In a recent interview with DATAVERSITY®, Yu Xu and Todd Blaschka (TigerGraph’s Chief Operating Officer) discussed how the technology has changed. When asked, Yu responded:

“TigerGraph 2.0 delivers a new level of deployment ease, allowing our customers to derive even deeper meanings from their connected data. TigerGraph’s MultiGraph service represents an incredible breakthrough, supporting real-time graph collaboration for the first time ever in a federated environment.”

According to Blaschka, enterprises can now improve the transparency and the availability of their data among teams, which promotes productivity and decision-making for everyone in the organization.

The Driving Forces of TigerGraph

Todd Blaschka, responsible for helping to grow their customer base, described why TigerGraph is such a significant advance:

“There are three key drivers that we address. One is real-time, meaning it’s not just real-time on a report, or a query, but that this data is consistently flowing through a graph engine, making it possible to get real-time updates.”

Such updates are “sub-second query responses” based on the new data coming in. According to Blaschka, business users don’t want to wait three days for results, or even one hour for that matter anymore, customers are relying on this. “It’s that spontaneous gratification, they want immediate results to capitalize on a business moment.”

Image Credit: TigerGraph

“The second thing is complex analysis,” said Blaschka. “Which is the ability to ask complex questions. Questions that would sound very common to you and I.” He gave the example of a San Francisco Giants fan who wants to buy a new baseball glove. Every time that consumer does a search for such a product they are providing more data to the company that sells the gloves.

“The question becomes, given that customer’s interest in San Francisco Giants and the baseball glove, what else is she or he likely to be interested in at the moment of browsing or purchase. This requires real-time search of other customers who are also fans of San Francisco Giants and have also searched or bought the baseball glove or related products. We are essentially traversing the social network looking for real-time product recommendations.”

The third element of the TigerGraph platform deals with scaling. “There’s more and more data, internal data, external data, and bringing those things together requires the ability to scale,” said Blaschka. TigerGraph has addressed the scalability issue by being providing both vertical and horizontal scaling. Horizontal scaling allows an organization to add more machines, providing a bigger Graph Database to support the data an organization is bringing in.

“With TigerGraph, we address some of the technical limitations that have challenged previous graph platforms. And that’s part of the technology. This was architected from the ground up, using what we call the Massively Parallel Graphs. It allows companies to scale very, very fast, and handle the performance requirement.”

TigerGraph also supports different graph partitioning algorithms, enabling it to split very large graphs over a distributed architecture. This can be done either automatically, or as specified by people using application-specific partitioning strategies.

MultiGraph Versus Multi-Tenancy

Multi-tenancy is a form of database architecture designed for multiple organizations/computers, also called tenants, which allows them to deploy and support a single codebase, rather than multiple applications. It allows updating to take place for all tenants simultaneously and makes it easier to support the server’s infrastructure. Generally speaking, multi-tenancy simplifies the development of a Software-as-a-Service application. However, keeping the tenants “separate” is a major security problem and requires redesigning the database layer.

The problem of installing security often causes technical difficulties and slowdowns. On this issue, Blaschka said that “TigerGraph takes away these technical barriers, while providing and maintaining fine grained security, so that what someone shouldn’t see, doesn’t get seen.” The system only provides the necessary information based on specifications set within the query engine, therefore Personally Identifiable Information or PII (as an example) remains hidden except to those who have access rights to that type of information.

Image Credit: TigerGraph

TigerGraph provides graph partitioning algorithms, which allows it to split very large graphs, spreading them out over a distributed architecture. The process can take place automatically, or by using application-specific partitioning tactics. TigerGraph has the capacity to scale Big Data, “in real-time.” Because of these strengths, more people can take advantage of the system, simultaneously. Blaschka explained:

“Our customers are using TigerGraph to support large number of applications accessing and updating different parts of the graph with hundreds of billions of nodes or entities. TigerGraph provides a secure way for users to view and update the data in parallel.”

MultiGraph also offers a Graph Analytics service focused on collaboration. It provides multiple groups with the ability to share a master database, while maintaining local control and security. As a result, team productivity and efficiency are increased, because everyone is using one database, with real-time updates. MultiGraph improves the transparency and availability of data being used for data-driven decisions. It also adds a layer of security, by assigning an admin, an individual who monitors and controls access to their local graph. There is also a superuser role, providing global access.

As a simple example, an investigator may be allowed to see the SSNs in the graph while a customer service rep is only allowed to see the addresses. Any changes to user data happens in a single system and immediately available to all users who have the right permissions. No multiple places to update data, no data silos.

The Roadblocks

Dr. Xu described the significant problems in developing TigerGraph:

“There were two roadblocks. The scalability and real-time performance. Graph Analytics is really the most difficult type of Analytics. That’s why the older generation Graph Database cannot provide this kind of performance.”

Storing large amounts of data was not the problem, but scaling the database out over multiple machines was. Users also wanted to use deep-link types of analytics on massive scales. They wanted Graph Analytics to look at the whole graph and to look at the really large subgraphs. But, the older Graph Database systems developed before TigerGraph didn’t reliably provide such scaling.

“So, all of our customers tried other options before choosing us.” said Dr. Xu. “They learned in a painful way. Scalability and real-time performance are very important to the customer, and that’s really not something a previous generation Graph Database could do.”

Filling a Need

TigerGraph is providing the next evolutionary step in Graph Databases. It is the first system capable of performing Real-Time Analytics of data on a web-scale. The Native Parallel Graph (NPG) is designed to focus on both computation and storage, while supporting graph updates in real-time and providing built-in parallel computations. An SQL-like graph query language GSQL allows ad-hoc exploration, and supports the analysis of Big Data. With expressive capabilities and NPG speeds, users can perform Deep Link Analytics to uncover connections and insights previously inaccessible.

Photo Credit: Yourg/Shutterstock.com

TAKE OUR DATA MANAGEMENT CERTIFICATION PREP COURSES

Data Topics

Scalable Graph Database Technology: Combining Big Data and Real-Time Analytics

Leave a Reply Cancel reply