A whole lot of open data made its way onto the web this week, thanks to work undertaken at the University of Southampton, where Semantic Web thought leader Professor Nigel Shadbolt is professor of Artificial Intelligence and Deputy Head of the school of Electronics and Computer Science.
The project was launched with a wholly practical approach to publishing data – about vending machines, catering halls, and other points of service – in RDF format by that school’s web manager, Christopher Gutteridge. And he’s hopeful that practical will become pioneering – so much so that he’s collecting basic recipes for good practices for organisations wanting to create Open and Linked Data here.
“This isn’t for modelling the human genome, but rather for everything that any organization would want to describe about itself on the web – products, people and places, roughly speaking,” he says. “We’re not publishing an ontology but how to use existing predicates in a way to construct something without worrying about the semantic values underneath. We do mint predicates and classes in the openorg namespace when we need something which we can’t find in an existing scheme.”
His point of view as someone whose job in central IT is, as he says, to make things work and be useful, means that he’s put the emphasis on collecting and using data on saving time, money and enabling a better and more effortless working environment — not so much on how many triples you can get up to. So, the idea is that all the parties maintaining that data now in spreadsheets don’t have to go outside their comfort zones or disrupt their established workload maintenance routines, while techies still get the advantage of using the data with semantic web tools.
A Proud Moment
All that’s required of decidedly non-techie data contributors – campus coffee shop managers or facilities staff – was to agree to allow Gutteridge to clean up their spreadsheets so that they’re semantically useful, then send the latest data his way. The public web site directory hosts the final RDF documents, provenance information about the event that created them, and the original spreadsheets too – all the data providers have to do is continue emailing Gutteridge the (now semantically formatted) spreadsheet that they’ve always known and loved.
Next on tap is inviting them to Google Docs to upload their data, and he’ll daily import that and report on it automatically via a SPARQL endpoint, if the checksum of the downloaded file alters. Otherwise, no import is required.
Gutteridge mentions that he was in the transport office at the university demoing how this all worked to the receptionist when her boss stopped by and asked if Gutteridge could take over responsibility for maintaining the data. “He looks a bit unsure and the receptionist says, ‘No, I can do that, it’s easy,’” Gutteridge reports. “I’m dead proud of that. It’s what I want to hear about people creating data for the semantic web!”
Real Semantics Get Real Useful
A real exciting part is building on top of the data a web site “so people who can’t use angle brackets can get use of it,” he says. With its big-wins focus on point of service data points, applications so far include an Amenity map that among other things plots where to get a main meal at the university’s sites using its catering points of service (from multiple eatery providers) and buildings and places data with data from the local council there on bus routes and stops. In another instance, there’s a mobile web app for finding bus stops that uses local council bus stop and Ordnance Survey post code locations to point would-be riders to the nearest bus stop.
4Store serves as the data store and SPARQL endpoint. And Gutteridge’s SPARQL server manager Dave Challis has added a CSV exporter so that data resulting from the queries – say, the route a particular bus stop is on – can be received as a spreadsheet that loads straight to a program like Excel. “I want something people can click on and get a spreadsheet, as that’s how normal people think and I want to protect them from the things outside their comfort zone. You want people to see what this technology can do, without forcing them to understand the details,” Gutteridge says. “You don’t need to learn to build an engine to see that a car is useful, but hand someone an engine without a chassis or seats and don’t expect thanks.”
Gutteridge sees tons of ways that making the university’s data open facilitates the exchange of information between parties that have data and those that want it. As an example, the school has data about which professors will talk to the press: It wants the press to have this data and the press wants it, too.
“Let’s get it to them in the best way we can (well-structured semantic data), plus provide them with tools to search it easily. Five years from now it’ll be like an RSS feed — any large organization will have a ‘press contacts’ feed URL where news organizations can aggregate their rolodexes from each morning,” he says.
The first draft of enabling this at the university is done. And next step, he reports, “is to get it into a Google spreadsheet with a row for each person/subject mapping and two or three columns for subject… narrow-, mid- and broad-concept. Each column can be free text (which we’ll mint a URI for as a skos:Concept with a URI taken from the text). However, if the column is a wikipedia URL, we can munge that into a dbpedia URI for fun and profit. (and get the title back out of dbpedia) That’ll give us a semantic list of concepts to experts.”
Gutteridge says that a few days into having the whole open data program being in a ready-to-go state, he was excited. But a day after launch, he had a realization: “This is just what we should always have. It’s not so much that it’s good as it’s reasonable,” he says. “We’re taking infomation that isn’t secret and making it as useful as possible.”