There is a fierce debate going on in the world of the Semantic Web and Linked Data, the question being is it of fundamental importance to realising the benefits of the technology or are they just dancing on the head of a pin. The core debate revolves around something with the stunningly opaque title of the httpRange-14 issue.
The debate has been rumbling on for years but was reignited over the last few days by proposals being submitted to the W3C to clarify and hopefully simplify things. I use the word ignited as that what I was beginning to think my iPhone was about to do – it has been buzzing away like a bumblebee on speed over the last few days announcing the arrival of yet another passionately held opinion from a member of the respected Semantic Web/Linked Data community from Sir Tim Berners-Lee downwards. Fortunately for those of you that do not follow the W3C’s Technical Architecture (TAG) and Linked Open Data (public-lod) mailing lists it may have gone unnoticed.
Let me try to explain, in as simple terms as possible, what the fuss is all about and why it may be important. From my point of view, and there are many surrounding this, the issue is a combination of two problems.
Firstly the difference between a thing and a description of that thing.
In Linked Data and the Semantic Web anything can be given a URI as an identifier, that’s what the ‘I’ in URI represents. The URI is in the form of a http web address that we are all familiar with from our browser address bar. When I say anything can be given an identifier in the form of a URI, mean anything – a web page, an image, a person, a location, an organisation, a story, a book containing the story, the film of the book, the Eifel Tower, an animal, a galaxy, or the concept of happiness – absolutely anything! If I access the URI for a thing such as a web page, I get back the thing – the html that constitutes that page. However if I access the URI for a physical thing, I can not get the thing back, so I would expect some information about it instead - in the case of the Eifel Tower I would expect it’s height, location, when it was constructed, and links to an image and a human readable page about it.
Some people assume that the URI for a description of a thing may be the same as the URI for the thing itself. If the thing is a webpage that would be correct, but if it is not, it would not. Why is this important? Well, there is information about a web page that would conflict with the thing it is describing – the obvious candidates here would be creation dates and licensing. Take this Wikipedia page about the Mona Lisa. It was last modified a few days ago, and is available under a Creative Commons Attribution-ShareAlike License. Obviously these two facts are not attributes of the Da Vinci work the page describes. However if you used that page’s URI as an identifier for the Mona Lisa, software could easily but mistakenly infer that they are.
The second problem, once you get the difference between an identifier for a thing and the potential for a different identifier for the description of that thing, is how you represent it in data and software terms.
Simply put there are two options for this. You can state in the descriptive data that what you are accessing is describedBy data found at an alternative URI, and obviously the reverse of that being to state that what you are accessing is the description of something (with a different URI). Alternatively using the web server responding to a request for a URI, you can redirect a request for a physical thing’s URI to the URI of the description. This is what Dbpedia does. If you enter the following URI in to your web browser “http://dbpedia.org/resource/Eiffel_Tower”, you will see that you end up at a page describing the attributes of the Eiffel Tower, but notice that on the way the address in your browser has been changed to “http://dbpedia.org/page/Eiffel_Tower”. So as you would expect, as a human, you are looking at a description of the thing not the thing itself. This leads to an often repeated error of copying the URI in the web browser address bar and using it at the identifier for the thing – easily done.
OK enough [simple] explanation, before I confuse you too much. Why is this important - or not? Many may consider issues like this as Semantic Webery not relevant to them as they look to the likes of Schema.org to improve the structure of the web - and their SEO ratings. However, I would suggest caution in such a dismissal.
It is important that this issue is put to bed within the Linked Data and Semantic Web communities so that they can get on with considering other issues and provide clear best practice advice to the rest of the world on how to describe and publish their data. Equally important is the best practice guidance on how to interpret and consume data published by others.
It is important, to the broader adoption of Linked Data, that it is put to bed. The issue already has the reputation as the topic who’s name should not be mentioned and is a great barrier to entry for those new to Linked Data.
It is important that this is put to bed soon as, driven by initiatives such as Schema.org, the world of the structured data web is moving on beyond the confines of the strict application of Semantic Web techniques. The Web is a messy place where best, and worse, practices are passed around by example. The backers of Schema.org recognise and support RDFa as a way of describing things, so expect a rapid growth of description in a Linked Data form. It is unlikely that the de facto way forward for creating and linking globally unique identifiers (URIs) will evolve in to something that will totally satisfy the Semantic Web purists. However the sooner we [in the Linked Data community] start promoting easily understood best practice to the increasing numbers wanting to publish structured data in an effective way, but don’t care that it may help bring forward a semantic web, the better the end result will be.
It is not important to that vast majority how these issues should be solved. They only will care about the most effective efficient way to consume what is already out there on the web of data, and publish their own data.
So back to my original question – is this debate fundamental? With the approaching wave of data on the web, yes I believe it is - or at least ending it in a satisfactory way is. Those dancing on the head of the W3C pin, need to settle this and move on. The rest of the world will benefit from clear advice and examples that they can copy without having to necessarily understand the semantic and sometimes philosophical debate and angst that produced it.
Update: For a more in-depth description of the W3C proposals that I reference, I recommend this excellent post from Jeni Tennison.
Richard Wallis is Founder of Data Liberate.
Eiffel Tower Picture from Wikimeadia Commons.