How to help the bench biologist get value out of the wealth of life sciences Linked Data sets? Startup Metaome Science Informatics proposes to offer some help with its DistilBio semantic search and data integration technology, by streamlining the approach to posing user queries. The Distil in DistilBio stands for Data Integration using Semantic Technologies in the Life Sciences.
Metaome, which was founded by CEO Kalpana Krishnaswami and CTO Ramkumar Nandakumar as a bioinformatics services provider before transitioning to a product vendor, contains a few more than a dozen life sciences public data sets so far. Infomaticians in the life sciences space have the expertise to query such data across sets via SPARQL, but the front-line biologist isn’t necessarily an infomatician. So, DistilBio has created a query interface that makes it easier for them to ask large and complex questions in a simplified way across data sets while building a graph in the process.
“How does a user say what are the drugs used for Alzheimer’s disease and do have they have certain protein targets and are those protein targets implicated in other diseases?” says Krishnaswami. “To ask that in one shot right now is hard without working through a SPARQL endpoint using all the SPARQL syntax.”
DistilBio uses its back-end ontology to make the job easier: It knows in response to a user typing in Alzheimer’s, for instance, that that is a disease, and that as such it has certain relevant relationships, such as to drugs, and so on. Or, in one of its published use cases, users can look at the drug Sitagliptin, and add nodes to explore associated disease indications, protein targets of the drug and their function, and interactions in humans.
Typing in a concept and drawing from there the entity relationships can lead to building “a grand query without knowing anything about SPARQL, or in which graph the data sits,” says Krishnaswami, an invited expert to the W3C Semantic Web Health and Life Sciences Interest Group. One of the jobs for DistilBio was to extensively index all synonyms given disparities in naming conventions in the field, so that that’s accounted for in results. Users viewing results can leverage the engine’s faceted interface to continue to filter through data. For instance, in the Sitagliptin example, a user might filter findings to see from a list of specific proteins those which are involved in glucose level regulation.
The company continues to add data sets to its application, which is still in beta stage. This week it also is adding support for SPARQL’s OPTIONAL keyword. If in a data chain there was no data available for a particular triple, then previous to this added support, DistilBio may not have returned results on a very long query. Metaome also is continuing to add new data sets, as well as integrate standard bioinformatics tools. Blast, a tool in the public domain that finds regions of similarity between biological sequences, is one of the first to be integrated.
Metaome is actively looking to get more users to try out DistilBio and provide feedback on the offering so far. On the radar for the future is a new version where users can upload their own data, connect it with public data, and search across the sets. This potentially could be a premium-priced offering, though Krishnaswami expects for its basic search capabilities to remain available as a free service. The company also is looking at offering an enterprise version that would live behind corporate servers to use for in-house data integration.
“Right now we are very focused in life sciences, but the next obvious sector would be health care,” Krishnaswami says.