Introduction to: RDF
RDF stands for Resource Description Framework and it is a flexible schema-less data model. Do not confuse or compare it with XML (more about this later)! It is one of the core technologies of the Semantic Web and the current W3C standard to represent data on the web. But what is RDF exactly?
As I mentioned, it is a data model. It can be compared to the relational model which is the way you organize data in a relational database: group related things in tables with attributes, create links between tables, etc. RDF is just another way of organizing your data. In which way? As a graph.
RDF is a graph
A graph is a representation of objects that are connected by links. In other words, you can have two things which are related in some way through a link that connects them. Take for example the following sentence: Austin is the capital of Texas. The two things in this sentence are Austin and Texas. These two things are related by the link “is the capital of.” And there we go: that is RDF! If you think about it, almost everything you say in an English sentence can be represented in a graph form.
You may have heard that RDF consists of a triple: subject, predicate and object. Well the subject is one of the things: Austin; the predicate is the link: is the capital of; and the object is the other thing: Texas.
If we consider the relational model, you could have a table of cities where each row has information about a city, the table’s attribute would include name, capitalOf, and a row would consist of Austin underneath the name attribute and Texas under the capitalOf attribute. Essentially, RDF is just another way of representing data. But why do we need this?
Why do we need RDF?
The web is a gigantic network full of information. You use a browser, go to your favorite search engine, search for Thai food in Austin, back comes a bunch of links, you click on a link which takes you to some website, you read the website and so forth. This website is a webpage written in HTML. The browser is able to interpret HTML and present to you a nice clean page. Furthermore, you can take any other browser, go to the same page, and you should see the same thing. Different browsers, same HTML page, same output. Talk about standards, eh? Well, HTML is the standard way of publishing pages on the web. Imagine if we had different ways of publishing pages on the web. Then each browser would have to be programmed in a way to know which language the page was written. This would be such a pain.
However, HTML is not enough. When we search on the web, we are not interested in a page about thai food in Austin, we are interested in finding information about restaurants that serve thai food in Austin (restaurant names, location, price range, etc). The issue is that the information we are looking for is usually on a page. What if we could have this information as structured raw data on the web so it could be easier to consume by other applications, instead of scraping an HTML page, for example.
The idea of having raw data on the web isn’t new. You can access raw data through excel or csv files, or different types of APIs. For example, if you want to create a mashup from different sources, you have to learn each API, get the data back in different ways and then programmaticly integrate the data. If you later want to add a new data source, you will most probably need to do other changes. What does RDF have to do with all of this?
RDF is to data as HTML is to pages. HTML is the standard way of publishing documents on the web. RDF is the standard way of publishing data on the web. RDF works as a common data model. Just imagine accessing different data sources and not have to worry about formatting the data into your own data model. You will always get the same data model: a graph. No need of altering your database scheme for new incoming data or having to write custom code. As long as you can store a graph in your database, any data from any data source is ready to be inserted! And the cool thing is that because we are working with graphs, you can merge nodes or create links between different graphs (Linked Data!). Talk about data integration!
Therefore, the same way any browser can access any HTML page, by having data published as RDF, any data consuming application can access any RDF on the fly without any huge overhead.
RDF and its Syntax
One of things I hear a lot is that people compare RDF to XML. Remember, RDF is a data model which can be represented in XML (RDF/XML). A lot of people also wonder if what they could do with RDF, they can also do with XML. The answer is: it depends. To keep it short, RDF is a graph data model that makes use of URIs while XML is a tree data model and doesn’t care about URIs. This topic deserves a blog post of its own.
There are other syntaxes for RDF: N-Triples, Turtle, RDFa (RDF in HTML) and JSON. You can even represent RDF in a CSV file. The new RDF Working Group is planning to present a standard JSON serialization of RDF.
So what’s next?
If you are coming from the relational database world, you may be asking yourself: What is the schema? How do I query RDF? RDF is schema-less, meaning that the RDF data is not tied to a specific schema. However, schemas for RDF can exist and they are ontologies, such as RDF Schema and OWL (Web Ontology Language). Additionally, SPARQL is the SQL-like standard query language for RDF. I will cover ontologies and SPARQL in my next blog post.