A couple of years back, The Semantic Web Blog visited with Vijay Bulusu to gain some insight into how pharma giant Pfizer Inc. was moving forward with semantic technology (see article here). At last week’s Semantic Technology and Business Conference in New York City, Bulusu, director, informatics and innovation at Pfizer, provided additional perspective on the issue – first, during the presentation on Using Linked Semantic Data in Biomedical Research and Pharmaceuticals (see coverage of that here), and then in a follow-up conversation.
A struggle for pharma companies, Bulusu notes, sits in driving standards for data that exists across system silos, so it is broadly applicable across groups. A transaction like creating a batch of materials, doing analytical testing on it and enabling clinical trial releases is the work of multiple groups of people in departments like R&D entering data across different systems.
The foundational layer needed to support data aggregation in a persistent graph semantic database and visualization with collaborative, semantic knowledge maps “is all about data already in transactional, silo’d systems,” Bulusu says. “We want to make sure that across those systems, key data is entered consistently for entities.” That means limiting them to selecting via a drop-down list from a vocabulary that is consistently managed and published from a single source to all these transaction systems, so the same entity is called by the same name as it traverses systems to support analytics and other requirements. That, he says, “is where we directly impact the day-to-day operational work of users.”
Pfizer is building a web services-based publishing framework for the job. It’s doing a phased implementation of vocabulary layers, with associated synonym trees, for entities, focusing step-by-step on different groups. There’s a cultural issue in getting different groups to accept the same nomenclature, but “still it is a semantic problem to solve,” he says.
Having the same entity referred to by the same name supports the visualizations the company now creates using Entagen’s TripleMap , which allows users to create, search and share structured knowledge maps of associations between domain specific entities. Franz AllegroGraph’s graph database of triples, drawn from those diverse foundational systems’ entities, is behind its data aggregation layer. “A user might say show me all you know about this compound, which connects into the aggregation layer,” he says. “So we aggregate at periodic intervals into AllegroGraph and then at the visualization layer you can see things in TripleMap.”
The Business Value
In the time since the Pfizer project began, cost pressures and requirements to improve productivity in the industry have made semantic web technologies a more viable option even to those who’d been skeptical in the past, he says. And putting the focus on the business case value, rather than the advantages of the technology per se, is how to drive projects like this one forward, Bulusu says.
Some examples: In the cause of precision medicine, pharma companies often need to collect data from various silos, which sets off manual processes of finding the data that’s spread across the different systems in different ways. That means they get pulled from their regular work to take on a very manual and time-consuming process of finding the data, compiling it and sending the report. Semantic technology could do wonders with a silo-busting and integration job like this, a lot more efficiently than humans.
“With the cost pressures and efficiency gains people are looking for, when you pitch it using that as the driver [freeing up people from this manual work that takes them away from their normal jobs], senior management listens,” he says.
It’s important for those driving semantic technologies in their companies to mingle with the business groups and ask them how they handle these and other scenarios to see how you can help them improve on them, Bulusu says. The use cases are there – you just have to actively look for them.
Another big trend that can drive semantics in pharma is that so much of the data once generated internally now comes courtesy of outsourcing relationships with contract manufacturing organizations and contract research organizations. “The whole point of outsourcing now is that those guys generate and own the data, and just send us the consolidated findings in this format,” he says.
“So the raw data lives outside and the question becomes, in that situation five years from now, how to run a query that needs to look at data in our firewall and data that sits in a CRO in a database in China,” Bulusu says. “There’s a big push coming to distribute querying across heterogeneous databases.”