What’s the path from an XML based e-government metadata application to a linked data version? At the upcoming Semantic Tech & Business Conference in Berlin, the road taken by the Dutch government will be described by Paul Hermans, lead architect of Belgian project Erfgoedplus.be, which uses RDF/XML, OWL and SKOS to describe relationships to heritage types, concepts, objects, people, place and time.
Some 1,000 individual organizations compose the Dutch government, each with their own websites. An effort to employ a search engine a few years ago to spider those different and separate web sites to have one single point of access didn’t work as anticipated. The next step to bring some order was to assign all the documents published on those sites a common kernel of metadata fields, which led to building an XML application to enable a structured approach. Linked Data entered the picture about a year and a half ago.
“The directional for moving to real Linked Data was that it is always better to have unique identifiers, because the way the XML application worked it all did the resolving of values using string lookups and text lookups, and that didn’t always work 100 percent,” Hermans says. “So the solution to that problem was to assign to every instance, every concept they had, a unique identifier, and then the step to using URIs as identifiers is very easily taken with the advantage that a URI is dereferenceable. So the rational was for unique identifiers that can be resolveable.”
The Linked Data project last week entered its next phase. What happens now, he explains, is that from the RDF triple store a static publishing process is started which generates upfront all the static HTML pages, the RDF pages, the notation pages, and also the old XML pages. The process is taking too much time and is too burdensome, so this phase is designed to have all publishing done in a dynamic way where a Drupal portal site will be talking to a SPARQL endpoint for HTML rendition, and a Linked Data API will serve all the other renditions.
During his presentation at the conference, Hermans expects to discuss the issues other organizations undertaking Linked Data migration processes may confront. As one example, he notes to be prepared to take a long time just in designing the URIs themselves. “The discussions for that issue took months and months,” he says. And even now, the URI design isn’t as up to snuff as he would like. They were completely modeled according to the Dublin Core Metadata standard but Hermans says that some subtle issues that have presented themselves.
For instance, if you have the same name for a city or for a province, to disambiguate between the two one decided to place part of the URI between round brackets. While that’s allowed, he says, “to our big surprise some software chokes on that….The main issues with URIs, to my surprise, is that process-wise it took lot of time and secondly there’s not enough information in the URI itself to do more intelligent things up front.”
Another issue to gird yourself for: “Something everyone needs to learn in a Linked Open Data project is that modeling in the semantic world is done mainly to infer new statements and that’s a whole learning process from people coming from a traditional background where you do modeling for other reasons,” he says. “It’s not for applying constraints but inferring new statements. And it’s hard to make that distinction and to apply modeling as a means to get new information.”
At the time the project began, he also found that existing vocabularies for e-government purposes weren’t as robust as required. “We looked at existing vocabularies but we didn’t always like the inferences brought about by those existing vocabularies, meaning we ended up writing a lot by ourselves,” he says. Other domains may have better luck, but in this case he estimates it took another two months for additional modeling that had to take all stakeholders from different departments into account.
None of this is meant to discourage organizations from embracing Linked Data, of course. “The main or big wins with a Linked Data approach and the modeling we did are found precisely in that area where the whole thing started – the knowledge and relationships of the Linked Data approach now are used within the context of the search engine. The relations that are in the RDF triple store are exploited by the search engine to build better navigation, to answer with better and more relevant search results, to offer related things and so on. So it really helps in fine-tuning and adding functionalities in the search engine.”
For an example of the Linked Data itself, go here. The guide to government services and information that government users and citizens can leverage is here. In the future, Hermans would like to extend the project to have outgoing links with the whole LOD space and map what exists today to the MetaLex Document Server that hosts almost all Dutch national regulations in CEN MetaLex XML and as RDF Linked Data, so that every entity from the Dutch government has a link to its governing regulation or law.
To register to attend SemTech Berlin, go here.