Healthcare sector startups are ripe for exploiting NoSQL graph databases. With a data model predicated on nodes/vertices and relationships/edges, graph databases provide a sturdy means to probe connections between entities, especially the farther removed from each other they are.
Healthcare organizations can “realize new opportunities and efficiencies by leveraging the connections within their existing data: be it in a connected genome, or a provider network, or patient treatments,” said Emil Eifrem, CEO of Neo Technology, in a recent statement summarizing the graph database company’s traction in the healthcare space. Among the healthcare customers now using its Neo4j technology, Neo listed in its announcement new-market entries HealthUnlocked, which relies on the graph database to relate millions of free-text terms used in its social network for health to an applicable health sphere; GoodStart Genetics, which enables scientists to conduct ad-hoc queries to discover the data they need within research and development information repositories; and U.S. physician network Doximity.
Life sciences and Big Data analytics platform company Zephyr Health is another health-focused startup that’s leveraging graph database technology as one important component of its service offerings. The three-year-old venture-based company takes data in a variety of forms from some 3500 sources – including public sources such as ClinicalTrials.gov and PubMed, as well as private data from partners and from customers’ own internal systems – to help pharmaceuticals and medical device companies understand and segment their target markets within a hierarchy or ontology of predefined categories, such as who publishes the most research in a certain area and who has formal leadership positions in particular fields.
That includes encompassing “information that makes it the long-tail of data, such as output from consulting engagements or surveying activities” that may see the light of day primarily in Powerpoint presentations, says Brian Roy, Zephyr Health Director. “We quantify the information and ingest it into our system to make it available.”
Zephyr combines all the requisite data – regardless of its structure or even if it has no structure at all – to deliver to its customers a single and unified profile of doctors and hospitals that are important to them. It can drive these conclusions based on the connections its products can make and the insights they can draw across data of any kind, without ever having to predefine data structures, thanks to their being built atop Neo4j’s open source graph database.
Zephyr Health’s use cases that take advantage of graph database technology range across four life sciences quadrants: medical affairs, sales and marketing, payers and clinical development. “The graph provides the search layer, that which handles the interconnections between disparate pieces of data and lets business users interact with it in a meaningful way,” Roy says.
Business Applications of Graph Databases in Life Sciences
The vendor’s medical affairs solution, for example, focuses on helping pharma companies find the right thought-leader doctors to talk to related to the development or marketing of a drug, based on querying data represented in a graph model to understand overlaps across their patient populations, treatment preferences, influence network, and so on. Its sales and marketing product aims to help teams in these areas understand which medical accounts to target for the adoption of a medical device or application, with greater relationships insight into instance rates of treatment or referrals by doctors associated with a particular condition.
As for clinical trial runs, Roy says, they typically run months behind schedule and go over budget. That’s largely because of difficulties around recruiting participants, often because the wrong institution or trial investigator may have been chosen. “With a large variety of data you can create a better selection of clinical trial types, and where and who should run them by juxtaposing [sites and investigators] against demographics of diseases,” he says.
Selection criteria for conducting clinical trials, after all, potentially should factor in whether there is a strong intersection among a patient population dealing with the disease being investigated, a hospital known for focusing on that issue, and a key opinion leader or doctor influential in the treatment of that condition. In a graph database, a complex query to help find the optimal site for a clinical trial – where the results set will come from the connectivity of many different data elements whose relationship to each other is as important as the items themselves – will execute via a high-performance traversal of the various nodes/relationships that comply with the request:
“As long as you can structure data in a reliable and predictable way – as long as you know what data you’ll get upfront – traditional database solutions work,” he comments. “But they don’t work for us because we are interested in variety, and we needed a way to get any data, with no precognition of what it is, and bring it into the system and store it. Where the graph plays is that we can search it in an efficient way without knowing what data it is.”
CHECK OUT OUR NEW PODCAST
Tune in weekly to hear different data experts discuss how they built their careers and share tips and tricks for those looking to follow in their footsteps.
The Upside and Downside Of Graph Databases
It’s been about two years since Zephyr began using Neo4j in development through production, and Roy says that its solutions are helping customers see significant improvements in key metrics, compared to their previous attempts to ferret out insights from the variety of Big Data. Part of what they’re finding, as Simon Elliston Ball, head of Big Data at Redgate Software, phrased it during a presentation about NoSQL for the Enterprise at the recent Data Summit in New York, is this: “Relationships count….If there’s one thing relational database management systems won’t do, it’s relationships. [For that], graph databases are really worth looking at.”
From Zephyr’s own perspective, being a startup – in the healthcare sector or any other – has its advantages when it comes to being able to adopt new ways of doing things, Roy notes. There isn’t a host of legacy software, such as relational databases, already in place that could hamper the adoption of newer technologies such as graph databases, he says. In more established companies, “just because everything is on Oracle, for example, often there is a momentum to continue to do everything on Oracle.” In a startup, those restrictions are removed, making it easier to drive innovation than it may be for organizations encumbered by what they’ve already put in place.
That doesn’t mean, however, that there are no challenges with adopting the unfamiliar. Graph data models, he notes, have actually been around for a long time, but were used mostly in highly academic contexts. He personally thinks they are underutilized because they are such a big difference from even other forms of NoSQL databases. In fact, as of June 2014, only one graph database, Neo4j, appeared among the top 25 in the DB-Engines Ranking, which ranks database management systems according to their popularity:
“NoSQL document databases, for example, are similar to object databases and especially now that document databases deal in JSON, which are objects in app development, they innately understand that. Learning to think in graphs is a much bigger mental departure from tabular columns and rows or object approaches,” he says.
So there may be a bit of a learning curve among business developers. That can be overcome, though; Roy also says that anyone walking around his company’s offices and conference rooms today will see graphs being used everywhere. “Once you start, it’s amazing to watch how quickly people start thinking in graphs,” he says.
But equally important is not to become convinced that graphs are the solution to all issues. While graph databases are a huge part of how it solves problems for its customers, Zephyr talks about “polyglot persistence” as its main operating model: That is, using multiple NoSQL data stores for what each does best.
“We use the graph for the power of traversal and the efficiency of it,” Roy explains, but also rely on document databases for storing bulk data in its historical context. “Graph databases are not very good at that because it’s a huge amount of data, it’s all the data we’ve got just to get us to the data we want to query over,” he explains.
Other companies, he thinks, should also consider the value that can come from putting multiple database types together in unique combinations to best solve their pwm specific problems. “Prove out what is the right combination for you, and do it small, cheaply and quickly to see in practice the performance capabilities and pros and cons of each persistent store of data to select the right one,” he says.