Photo courtesy: Flickr/ bru76
There’s a lot of Linked Data out there. The question is, how practically useful is it?
Perhaps not as much as is needed, and that could be bad news for the whole Semantic Web movement. That’s one of the reasons for a partnership just undertaken by Ontotext (one of the presenters at the upcoming Semantic Web Summit) and Structured Dynamics to close some of the gaps in the Linked Data and Semantic Web ecosystem.
“I think the concern right now is that the Linked Data that is out there is not being used, except in isolated pockets that are curated,” says Mike Bergman, CEO of Structured Dynamics. “With all the hype that is coming down, the quality is poor, context is lacking, and use is not evident. So do we risk a backlash by hyping something that no one is really using? I think we’re getting close to that point.”
It’s a problem of weights and barbells, as Bergman likes to think of the triples idea upon which Linked Data operates – the weights on either end being the subjects, and the barbell the relationship between them, all building to an RDF data set in which the connections among the nodes are graphically expressed. What is unique about the Linked Data idea, Bergman says, is drawing connections among these data sets, each likely adhering to its own internally consistent vocabulary – the problem there being whether a subject in one set has the same meaning as a subject in another. Building reference vocabularies that go beyond simple standard ontology vocabularies can then be referred to by both data sets; they can enable a coherent, complete definition of what is meant by each node within a specific domain (as with the MeSH vocabulary in life sciences) so that data sets can then speak the same language to each other.
“The reference vocabularies provide fixed reference points, to give you a sense of orientation, of context. They are the fixed points by which you navigate,” Bergman says. Structured Dynamics and Ontotext have approached the Dublin Core Metadata Initiative to take the lead in driving the discussion about what constitutes a good reference vocabulary for linking purposes, and to play an active role in putting up public repositories of such vocabularies. Vertical industries themselves will need to be the ones who work out vocabularies that do a good job actually describing their domains, though. The overture was well-met, Bergman says.
Equally problematic, however, is the relative dearth of off-the-shelf predicates to use for making connections between the nodes, when the relationships between two different data sets are approximate rather than exact.
“So we have a semantic gap—we need reference vocabularies to keep people oriented in the right direction, and we need linking predicates or verbs or relationships between these data sets that are more approximate vs. exact,” Bergman says. “Exact would be great if that were true. But most real relationships are not exact. and when you say they are and they aren’t, you introduce errors into the whole Linked Data structure.”
Ontotext’s PROTON basic upper-level, domain-independent ontology that contains about 300 classes and 100 properties, and Structured Dynamics’ UMBEL ontology of about 21,000 across-the-board reference concepts for tagging information, will have their views of the world combined to enable integrated consistent mapping between them, and then that will be used to better organize the instance data of all the articles in Wikipedia. “The combination of those then would be a very powerful open source reference point for people that want to relate things together and describe them coherently,” says Bergman.
Structured Dynamics will release its next version of Umbel in about two weeks, and then, by year’s end, there should be a new version of Umbel (1.00) that includes the Wikipedia work as well as a new FactForge – “the largest and most coherent interoperable linked data sandbox in existence,” as Bergman calls it – that will allow easy access to the most central LOD data sets through the vocabularies of UMBEL and PROTON.
Will Everyone Take the Step?
“From our standpoint we view it as an essential step to make Linked Data worth the hype,” Bergman says. When it comes to Linked Data, he says, “we’ve got the Christmas tree. Now, how do we hang the ornaments on it. We’re trying to provide an idea of that scaffolding so that people can do that.”
The two vendors say they are looking for more partnerships with players serious about interoperable, high-quality data on the semantic Web. But Bergman acknowledges some challenges ahead, including those that may come from within the Linked Data community itself. There shouldn’t have to be a conflict between open data and useful data, but Bergman thinks that, in some quarters, there is.“I think there is a camp or philosophy within Linked Data that probably doesn’t want any strictures, guidance, reference vocabularies, or any of that.”
Maybe there’s a place for that, he says, but it’s not to be found among enterprises and other organizations that need to rely on information to make decisions. “There is incredibly valuable information out there, for that community, for people who actually want to use published data on the web for learning and making decisions,” he says. “I think we will find acceptance there. This has to be part of the answer.”