The upcoming Semantic Web Summit will kick off with a conversation about how people are using the Semantic Web today, hosted by Amiad Solomon of Peer39 and Lee Feigenbaum of Cambridge Semantics. Perhaps one example you’ll hear about then is the work underway at Semicolon (Semantic and Organisational Interoperability in Communicating and Collaborating Organisations), a research and development project partly funded by the Norwegian Research Council, to create faster and cheaper semantic and organizational interoperability within and without the public sector.
Cambridge’s Anzo Semantic Web solution is being used to help Statistics Norway make it easier for those within or outside the government to benefit from interoperability among the data sets the department produces. That data — statistics on important aspects of Norwegian society — typically gets stored in thousands of individual Excel spreadsheets or made available in HTML on the web, neither format of which makes it particularly easy to bond with other data.
But, says Per Myrseth, chief specialist, Information Risk Management at Det Norske Veritas AS, one of the organizations working on the pilot, “At the macro level that data is potentially Linked Open Data.”
And realizing that potential is important to the government and its e-government efforts, as it believes that enabling greater collaboration among public organizations, citizens and businesses can make the public sector more effective and efficient. “So, at a very high level we look at Linked Open Data as a new way of collaborating in the public sector, where you have very disconnected information systems but they can interoperate by linking to each other’s data,” Myrseth says.
The Statistics Norway part of the project involved having students enter some of the department’s spreadsheets’ data into Anzo, topped it off with some small pieces of ontologies, Myrseth says. “What we gain is if we make one geographical ontology, for example, and one very simple time ontology, we can take an arbitrary Excel spreadsheet with geographical or time dimension, or both of them, and link them together,” he says. “So we have a general tool for linking any kind of data that could be linked to a simple ontology. So, we are kind of mapping data from the macro instance level up to the ontology, telling that this data are about this municipality in this period and storing it. And, when you enter several such kinds of data sets, you can use Anzo on the Web to choose how you will drill on the sum of all these data sets.”
Fewer Resources Needed For Linking Data Sets
As a result, average users – whether a reporter for a local newspaper, a student doing analysis for a project, or researchers within the government itself – can link data sets without needing the resources of a big IT staff or big IT budget. “Here we have a tool that can do that work for them,” says Myrseth.
But that’s not all that’s required, he says. It’s also important for Statistics Norway to ensure that that data linking operates under orderly rules, so that users don’t make mistakes that lead to erroneous conclusions. If people don’t understand the meaning of data in a data set, they can wind up with totally wrong conclusions, he says.
The fact that linked open data is not necessarily under one information governance regime, so there’s not one type of quality metrics on the data, nor one methodology for describing what it means and its provenance for quality assurance, brings complexity to the overall concept.
“Our suggestion is Statistics Norway should do the linking of their statistics data into some simple ontologies, and afterwards people can link any kind of data to those same ontologies and do drilldowns from there. But if linking is done improperly [to begin with], the results of merging and drilling will always be wrong. That’s a big challenge with big open data concepts,” Myrseth says. “There are so many topics to drill into before you can be sure drilling on linked open data is giving you the conclusions you would like to trust on.” Statistics Norway agrees about the focus on metadata, he says, and in fact was the first department in Norway to establish a metadata quality strategy in 2005.
There’s no obligation for Statistics Norway to use the solutions that contributed to the pilot, but Myrseth says the results will be taken into account by the department as it works on future strategies. And there is another now-live effort of the Semicolon project that makes a Linked Open Data solution of the Norwegian Business Register.
“By encapsulation the registers [as web services] so they can be accessed by Linked Open Data protocols, you have a very new way of using those registers without changing them at all,” Myrseth says. “It’s the first national business register on the Linked Open Data cloud ever.”