Clinical studies aren’t what they used to be. In the past, the process was one-off: You conducted a study, gathered a lot of data, analyzed it, wrote a report, and submitted it to the authorities. But, says long-time Linked Data advocate Kerstin Forsberg, an information architect at AstraZeneca, that’s all changed in the last few years.
“A study is not a study on its own,” says Forsberg. Today, the goal is to do meta-analysis across many studies, so parties ranging from pharmaceuticals companies to contract research organizations to government authorities all are ‘customers’ of clinical data, so to speak. Data from various studies must be shared among all these parties. “It puts a new context around clinical trial data, that it must be easy to link data together, to link across several different studies,” she says.
The case is there to use modern information standards, like semantic web standards and Linked Data principles, to address this need. It’s why Forsberg is one of the individuals spearheading a volunteer effort to create RDF and OWL representations of the standards published by the Clinical Data Interchange Standards Consortium (CDISC) an international, non-profit organization that develops and supports global data standards for medical research.
The problem is that CDISC publishes its standards in huge PDF documents, with Excel matrices and some first-generation XML implementations, she notes. “If I want to know how to exchange lab data with a number of different measurements, I must pick up the PDF document to see how to structure the data sets, what elements are in them,” she says. That’s fine for a human to do, but better yet would be to represent the standards data in RDF format so that it would be machine-readable and thus seamless to refer to a single element in a full data set of labs data.
The effort has begun with the most commonly known and used part of the CDISC standards portfolio, its submission data set standards (for providing drug research and testing information to authorities like the FDA). These have now been published in RDF format as a demonstration of the usefulness of the approach. Forsberg is currently composing a blog to take industry users through the transformation process (see the cdisc2rdf.com blog). This is important because semantic web and Linked Data still represent new territory for long-standing standards organizations, and it can be challenging for them to come up to speed.
The next step, she says, will be to have this picked up by CDISC and the NCI Enterprise Vocabulary Services (EVS), which has provided terminology content, tools, and services to accurately code, analyze and share cancer and biomedical research, clinical and public health information. “So, when they publish their documents or standards in PDF format and Excel formats, that they also publish in RDF and OWL owl formats and have a nice Linked Data interface to it,” says Forsberg. “So when I’m using a URI for a specific data element in the clinical data standards, they can provide me a nice Linked Data interface to it as well.”
In November a public meeting took place around studying new standards for data exchange with the FDA in Washington D.C. The summary from that meeting was that there was general interest in the semantic web, but that more information was needed to better understand its potential use. In a mid-January FDA meeting, according to Charlie Mead, co-chair of the W3C’s Health Care and Life Sciences Interest Group (HCLSIG), based on the follow-up questions from FDA personnel in attendance and reports of subsequent internal inquiries, “the FDA appears to have both continuing interest in, and an initial understanding of, the overarching value proposition and possibilities of SW-based strategies.”
On behalf of the W3C HCLS Work Group, Mead says that he and Eric Prud'hommeaux, the W3C’s HCLSIG team contact, have proposed to the FDA that they provide some additional semantic web training to interested persons within the FDA. The goal, he says, is “developing enough internal interest and ongoing continuing education that a planning group can be formed to examine viable operational strategies around current FDA goals – for example, the "58 'n 5" plan to develop reusable content in 58 Therapeutic Areas, next-generation submission strategies” and so on.