Semantic Interoperability: The Future of HealthCare Data

By   /  September 16, 2014  /  2 Comments

healthcare data x300by Jennifer Zaino

Good healthcare depends so much upon having high-quality information about a patient. The problem is that that data lives across multiple providers and institutions, and that the industry has yet to fully conquer the challenge of exchanging and integrating this information, thanks to the use of multiple vocabularies, formats, and systems by all the players in the chain.

Last year, at a workshop held at the Semantic Technology & Business Conference in San Francisco, a big step was taken towards tackling this problem. It came in the form of The Yosemite Manifesto, a position statement that debuted at that conference’s panel on RDF as a Universal Healthcare Exchange Language. The Manifesto recommended using the World Wide Web Consortium’s (W3C) RDF (Resource Description Framework) standard model for data interchange as a universal healthcare exchange language, describing RDF – one of the core technologies of the Semantic Web – as the best available candidate for the job.

The statement’s other points included that:

  • Electronic healthcare information should be exchanged in a format that either is an RDF format directly; or has a standard mapping to RDF
  • Existing standard healthcare vocabularies, data models and exchange languages should be leveraged by defining standard mappings to RDF, and any new standards should have RDF representations
  • Government agencies should mandate or incentivize the use of RDF as a universal healthcare exchange language
  • Exchanged healthcare information should be self-describing, using Linked Data principles, so that each concept URI is de-referenceable to its free and open definition.

Since that event, the Yosemite Manifesto has gained more than 100 signatures, not only from the Semantic Web community, but also from some medical doctors, researchers, and other professionals connected to the healthcare sector. There also have been a number of projects undertaken since then that reflect the Manifesto’s goals, according to David Booth, who led the RDF as a Universal Healthcare Exchange Language conference session last year, as well as its second incarnation at this past August’s Semantic Technology & Business Conference.

“People who are knowledgeable about both Semantic Web technology and healthcare believe that this is the best way to go,” says Booth, senior software architect at Hawaii Resource Group, which in one of its projects used Semantic Web technology to make clinical healthcare data interoperable between diverse systems. To that end, at this year’s SemTechBiz Event, Booth unveiled The Yosemite Project: A Roadmap for Healthcare Information Interoperability, to explicitly lay out an overall plan for the industry to follow to achieve semantic interoperability of all structured healthcare information through RDF.

Why the Roadmap Matters

“If we really care about healthcare, we need to have healthcare data interoperability,” Booth says, with the assurance of unambiguous meaning across the shared information. Part of the idea of putting The Yosemite Project roadmap out there, he says, “is to motivate people to look at it, and to learn about Semantic Web technology and how it really is relevant to addressing this problem.”

Here’s one of them: The key thing about RDF “is that it lets you represent many different data models with one model language, and it still lets all the systems retain the expressivity of the native system, yet also express meaning in a common language,” says Rafael Richards, physician informaticist at the U.S. Veterans Health Administration and a participant in the recent SemTechBiz panel. (At the Semantic Web Blog, you can read about his work translating to RDF the VA’s VistA electronic health record system data, as well as data types conforming to Health Level Seven International (HL7’s) Fast Healthcare Interoperability Resources, or FHIR, standard for data exchange, and information types supporting the Logical Observation Identifiers Names and Codes, or LOINC, database that facilitates the exchange and pooling of results for clinical care, outcomes management, and research.)

“People want computable health information,” adds Claude Nanjo, senior software architect at clinical decision support system vendor Cognitive Medical Systems solutions, and also a participant at the SemTechBiz panel. “So we need ways to expose, structure, and exchange healthcare information.”

This, he says, is where The Yosemite Project can make a difference; one that Cognitive wants to participate in by giving RDF an important role in the platform the company is developing to enable healthcare institutions to be able to access clinical decision support services within their existing technology infrastructures. Cognitive is also collaborating with the OpenCDS team at the University of Utah to contribute towards an open source clinical decision support platform. It aims to enlist the Socratic Grid Open Source version of its decision support framework in service of the effort, “and make sure there is a strong foundation behind it that is web-based and particularly semantic-technology-based,” Nanjo says.

A Drive Down One Track of the Roadmap…

The Yosemite Project roadmap defines two tracks to take to the semantic interoperability destination, as shown below. One is the Standards Track, a way to achieve interoperability by having every party speak the same language through the use of the same data models and vocabularies. The other is the Translations track, so that Data Model A can be translated to Data Model B and Vocabulary C can be translated to Vocabulary D. “It is preferable to use standards whenever possible,” Booth says, but it is not always possible.


When it comes to the starting premise of using RDF as a universal healthcare exchange language, Booth explains that that does not necessarily mean using an RDF serialization for exchanging data. Rather, it just means that “whatever data formats and vocabularies you do use, make sure there is an RDF mapping to be able to interpret them in terms of RDF semantics,” Booth says. A key value of RDF is that it completely separates data formats and syntax from meaning, so meaning can be expressed in a common form.

Retrofitting existing standards via mappings to and from RDF, and using the W3C’s Web Ontology Language (OWL) – which is a semantic markup language for publishing and sharing ontologies on the web – to create an OWL ontology for each standard, open the door to a consistent way of interpreting healthcare data across formats and vocabularies. “If you don’t have the data mapping, then you don’t have it at all – you don’t know what the data means,” says Booth, adding that he thinks the data models need to be addressed before the vocabularies. They’ll be harder to accomplish, because they involve a combination of pieces of information in certain patterns, so the job goes beyond the one-to-one mapping required by vocabularies.

Happily, progress is underway for a few of the standards listed in the Unified Medical Language System. For example, SNOMED CT, the comprehensive, multilingual clinical healthcare terminology from the International Health Terminology Standards Development Organization (IHTSDO), has an existing OWL ontology, “and often if you have the ontology there is a corresponding RDF representation for the instance data,” Booth says. Another favorable development is that The World Health Organization’s International Classification of Diseases 11th Revision, aka the ICD-11 standard, was created using a specialized version of the Web Protégé ontology editor for OWL and RDF.

Expressing the meanings within different standards in a common way using RDF and OWL needs to be paired with creating a collaborative standards ontology repository to manage definitions and facilitate standards connections and convergence, Booth continues. He sees that as a crowd-sourced collaborative platform – potentially “the next generation of Bioportal [a repository of biomedical ontologies],” that would serve as a common resource to ensure healthcare information standards are strongly and semantically linked to each other. It would encourage semantic consistency across different standards and within each single standard, too.

“The net effect is to have one big consistent standard, even though it’s made up of a bunch of smaller ones,” he says.

…And a Drive Down the Other

While in the ideal world healthcare data interoperability could be comprehensively accomplished by relying on all parties speaking the same language, thanks to the use of the same data models and vocabularies that leverage RDF mappings, that world doesn’t exist. “Semantic interoperability cannot just be a standards-based strategy,” Booth says.

For one thing, he points out that standards typically take a long time to come to fruition, and not every organization can wait for the final product. And, he adds, the more you try to standardize, the bigger the effort and the longer it takes. “So if you try to standardize all of healthcare in one big cohesive standard, it’s like trying to boil the ocean,” Booth says. Even once a standard has been published, typically it’s regularly updated to account for things such as changes in medicine and technology – but not every organization adopts the latest version on the same timeline. Additionally, there are different use cases for how data will be employed, and depending on that, the data may need to be represented in different structures and at different levels of granularity.

“You can’t have a one-size fits all standard,” he says. “There are a number of fundamental reasons that we can’t get around the need for translations.”

To accommodate that, The Yosemite Project roadmap outlines a path in which data model and vocabulary translations can be easily shared and cost-effectively maintained through crowd-sourced translation hubs in which participants collaboratively develop agnostic translation rules. They can be written for formats, data models and vocabularies in SPARQL, Java code, and so on, and downloaded for plug-and-play use. RDF is used as the standard base language for defining those translations, in comparison to today where “translation is done in a black-box, proprietary, peer-to-peer basis,” Booth says, an approach that is costly.

“The translations component allows different parties to have different data represented differently, and still be interoperable,” says Booth. “That’s a fundamental reason why the translation component is essential to have.”

Moving Forward

The roadmap also explicitly highlights the need for policy incentives for achieving healthcare data interoperability.

An issue today – in some countries perhaps more so than in others – is that healthcare is run as a business. And in the U.S., for example, “there is no natural business incentive for a healthcare provider to make its data interoperable with others’ systems,” Booth says. There have been some starting attempts, such as the Meaningful Use principle, part of the HITECH Act that encourages interoperable electronic healthcare records to be able to be used in a meaningful manner. The criteria mentioned for aligning with this principle doesn’t include RDF, however, “so there’s a long way to go on that, but it’s sort of a parallel process that we need to address,” Booth says.

Ninjo comments that the various Meaningful Use incentives and initiatives by the Office of the National Coordinator for Health Information Technology (ONC) have helped fuel a movement towards the adoption of electronic representations of clinical data – a significant step forward. “But,” he adds, “unfortunately this effort was not initially accompanied with sufficient incentives to foster better convergence on the representation, structure, and ultimately semantics of the clinical information that was captured and exchanged.”

Thanks to a number of ongoing ONC initiatives, he says, a great deal of work is underway to improve and expand standards to better facilitate such convergence, however. “Policy, by encouraging standards development organizations to develop well-thought out and adoptable standards, can greatly help move us towards better convergence, towards fewer clinical models and terminologies that have been vetted through pilot implementations,” says Ninjo.

Participants in the standards development process, he says, may then consider RDF as an alternate and very viable way to represent clinical knowledge if it proves to be so, he says. As an example, he suggests that a standard such as FHIR may choose to officially offer an RDF representation of its clinical resources in addition to the current representation formats it now supports. If clinical information can be represented as RDF, “the boundary between terminologies and models could be potentially eliminated,” Ninjo says, adding that he believes that the potential of using RDF for semantically interoperable health records “is tremendous.” It encompasses everything from enabling data mining on structured information, to allowing knowledge discovery to happen, to letting care providers start to identify trends to see what treatments work and when, to offering valuable information for clinical studies.

Indeed, there are opportunities for anyone involved in the healthcare data sector to play a role in driving to such ends. The W3C, Booth points out, has been the epicenter for Semantic Web standards and technology, and for several years has been ushering the adoption of this technology in the biomedical space through an interest group on Semantic Web for Healthcare and Life Sciences. Now, there’s a new work group at HL7, a healthcare-specific standards organization, being formed on RDF for Semantic Interoperability to further this adoption process within healthcare, according to Booth.

“The HL7 group will also coordinate and collaborate with the W3C,” he says. “For anyone who wants to be involved, this is a great opportunity.”

  • Andries van Renssen

    Talking about RDF as a ‘language’ suggests that it can be compared to a natural language. In fact that is misleading. What is required is a real multi-lingual formalized natural language with much more semantic expression capability than RDF.

    The roadmap puts ontologies above RDF. But it seems not to be aware that there are two kinds of ontologies: ontologies that define a formalized language and ontologies that model knowledge.
    Nearly all ontologies do not distinguish these two categories either. They provide modeled knowledge about some domain and at the same time define their own dedicated language (in the form of kinds of relations and concept definitions). The problem then still is that those ontologies (knowledge models) are still using their own different and incompatible ‘languages’ and do not use a common multi-lingual formalized natural language.
    My recommendation is to split the ‘ontologies’ stage in the roadmap into two stages for the ontology: first the development of an ontology that defines a formalized language (a ‘language defining ontology’) and secondly developing/converting multiple ontologies ontologies (knowledge models) by expressing them in the ‘ontology language’ (‘knowledge modeling ontologies’).
    For a candidate formalized natural language see ‘Gellish Formal English’ (and its variants in other languages, such as ‘Formal Dutch’) on
    (which can be expressed in RDF)

  • williamgoossen

    How would this relate to the about 15 years of developments in semantic content for health care? In particular what is the relationship with the ISO 13606 / Open EHR archetypes? And what about the HL7 v3 templates, messages and Clinical Documents Architectural artifacts readily used in worldwide implementations such as epSOS, Trillium Bridge? How can you relate this approach to the ISO TS 13972 Detailed Clinical Models that specify in depth requirements for semantic artifacts? What is your relationship with the clinical information modeling initiative and what about the OMG adaptation of Unified Modeling Language which now can do a proper code binding (as from LOINC, Snomed CT, ICD 10 / 11 and more? And what in the US about the SMART project and in Europe the Semantic Health Net work?

You might also like...

5 Pitfalls to Avoid from Real Stories of Analytics Projects (Part 2)

Read More →