Standard Analytics, which was a participant at the recent TechStars event in New York City, has a big goal on its mind: To organize the world’s scientific information by building a complete scientific knowledge graph.
The company’s co-founders, Tiffany Bogich and Sebastien Ballesteros,came to the conclusion that someone had to take on the job as a result of their own experience as researchers. A problem they faced, says Bogich, was being able to access all the information behind published results, as well as search and discover across papers. “Our thesis is that if you can expose the moving parts – the data, code, media – and make science more discoverable, you can really advance and accelerate research,” she says.
The scientific research space is not new to leveraging semantic technologies to improve search, discovery and collaboration. Think of ventures like ResearchGate, VIVO, and the Research Data Alliance, for example. Standard Analytics potentially differentiates itself from some of these and other efforts by what Bogich says is its focus “on working with publishing companies from the get-go, vs. getting individual scientists to change their behavior right away.” It’s engaging in conversations with the handful of publishers that cover some 6,000 scientific journals among them, as well as a big non-profit, none of which she can name yet.
Using JSON-LD as a format for use with schema.org for exposing research pieces represents a pragmatic way of getting publishers to bite into the idea of publishing research documents that can be served up as Linked Data packages, she notes. “It’s great for working with publishers, and they just need to add one SCRIPT tag,” she says. “RDFa is super-lightweight but this way they don’t have to alter the markup of the rest of the paper.” The company so far has 23 million open access articles
Standard Analytics provides two ontologies – one to extend schema.org so that it can be used to describe statistics and drawing conclusions out of data, and another to extend schema.org so that it can be used to describe data and the relationship between data and generating/processing algorithms, and to serve as the basis for package.jsonld, a manifest file used by its open source package manager for Linked Data. “Scientific papers to us are just packages or structured APIs with concepts, data, code – the moving parts,” she says. “The idea is that you can ‘install’ a paper, treat it as a package.” It’s started things off with 23 million open access articles primarily from PubMed.
Standard Analytics will leverage hundreds of web ontologies for existing disciplines, to identify concepts within scientific articles; support semantic versioning to decode meaning that changes from one update to the next; and let users use its REST API or Linked Data package manager to retrieve the data, code, media and dependencies behind research.
With semantic versioning, “the idea is when you keep adding pieces of content or a new ontology comes into play, you can reindex and bump a version for keeping good track of the understanding of scientific knowledge through time,” she says. Programmatic access to science is becoming more important, she says, and Standard Analytics wants to be the API of APIs, building an understanding across all APIs and publishers to keep an index of every piece of content that can point user to original sources while being the central point where it all comes together. “A lot of exciting apps are being built on top of APIs, like notification systems or reference managers, so people are building tools on top of the scientific body of knowledge,” she says. “We’re trying to integrate all existing APIs.”
The company will spend the next few months on the search user interface, less to service developers than the scientific community that simply wants to run queries. “We don’t want to have people writing SPARQL queries, but to get as easy as possible a search interface, maybe with dropdown menus,” she says.