I often read blogs and watch conference presentations extolling the virtues and benefits of adopting Semantic Web and Linked Data techniques & technologies. It makes me wonder how those new to the field ever get through the blizzard of acronyms and techno-speak, to understand what is being promoted and how it might be relevant to them and their business. In this post I will attempt to demystify and identify the core benefits of Linked Data without burying you in LD-speak!
These elements together – HTML encoded URLs on a page, linking to any other page, then linking on to others – are the simple power behind the Web. We all soon found the follow-your-interest click-by-click navigation from page to page and site to site, the natural way. In the early days this is all there was, today however none of what we do on the Web would be possible without this basic foundation of simple links. Just like Henry Ford would recognise the basic elements of his Model T, underpinning our complex automobiles of today, I am sure that Sir Tim Berners-Lee recognises his basic link in everything Web.
Once the Web became established, next on the scene was the promise of the Semantic Web. As described in a 2001 issue of Scientific American[pdf], it envisages a web of intelligent connection between data elements and information. At the time the concept seemed very futuristic, as we were nowhere near a web of catalogued data that could deliver it. Over the next few years Semantic Web became more the province of artificial intelligence (AI) associated academic research. It is this that contributed to it gaining a reputation for being a great idea that would never come to fruition. Over this time however the W3C worked upon and released the standards that would be needed to enable such a Semantic Web.
Away from the AI community, use started to be made of these standards to pragmatically enable the linking of data across the Web. By 2006, this evolved into a more formally recognised approach that Sir Tim has given the name Linked Data. Research Linked Data and again you are bombarded by acronyms and concepts such as OWL, SPARQL, URI, inference, turtle, triple stores, RDF-XML, RDF, RDFa, content negotiation, and dereferencing.
Like the Web that underpins it, Linked Data has at it’s core some blindingly simple concepts that all this, initially incomprehensible, stuff builds upon.
The first of these is the ability to identify things in a globally unique way. Imagine that you were in charge of keeping track of some large pieces of hardware, spacecraft for instance. You would give each one an identifier. eg. 1969-059A. You probably have lots of things to keep track of so you would probably categorise your things – spacecraft/1969-059A. You then would like to make sure that it is ‘your’ identifier, not to be confused with any other so you prefix it with something globally unique that you own, such as a domain name – nasa.dataincubator.org/spacecraft/1969-059A. As it is now looking like a web address why not make it one, and as a good internet citizen, provide some information about your thing if people follow it – http://nasa.dataincubator.org/spacecraft/1969-059A. What we have created is a Uniform Resource Identifier (URI). Yes it is a type of web address , but it’s main feature is that it is a unique identifier.
So what sort of information would you return to those calling that address? Probably attributes such as mass, name, launch, etc. In RDF, (Resource Description Framework) – the de facto Linked Data format – it might look like this:
Note that the launch is a thing in it’s own right so it is represented with it’s own URI, which in turn has it’s own attributes. Check out this simple RDF data model to picture this.
What you are seeing is information about things in simple three element statements, or triples. The triples that relate your thing to another URI are the links in Linked Data. You are not restricted to linking to URIs in your own data set. Because URIs are globally addressable they can exist in any domain, thus:
<http://nasa.dataincubator.org/site/capecanaveral> sameAs <http://dbpedia.org/resource/Cape_Canaveral> .
By following the links you, or an application someone writes, can navigate between the concepts and things described using Linked Data.
There is a simple convention that makes this even more useful. Wherever a URI is used it is assumed that it references the same thing. So if as above I say my thing is the sameAs http://dbpedia.org/resource/Cape_Canaveral and you define your thing in your data as being the same as http://dbpedia.org/resource/Cape_Canaveral, then anyone can infer that our two things are also the same without having to consult either of us.
Just like any other concepts, there are URIs defined for relationships between things and their attributes. These are openly published in ontologies so that their meaning can be globally understood. Where above I use sameAs, it is actually shorthand for http://www.w3.org/2002/07/owl#sameAs, often abbreviated to owl:sameAS.
Those are the basic, most valuable, elements of Linked Data. They do not require complex infrastructures to support them. The RDF to be returned could even be held in simple files. Plus there are other ways to encode RDF, such as in XML, to make it more system or human friendly.
I could go on to describe triple-stores, a storage and query tool tuned to work with triples, or SPARQL the query language they use. However I would be drifting beyond the Simple Power of the Link. Moving away from the simple benefits that can accrue by building upon Linked Data. As wonderfully demonstrated by organisations such as the British Library and the BBC with it’s Wildlife Finder. The BBC site is a great demonstration of the follow-your-nose navigation than can flow from building on a Linked Data platform.only alice on Flickr