Today, the World Wide Web Consortium announced that R2RML has achieved Recommendation status. As stated on the W3C website, R2RML is "a language for expressing customized mappings from relational databases to RDF datasets. Such mappings provide the ability to view existing relational data in the RDF data model, expressed in a structure and target vocabulary of the mapping author's choice." In the life cycle of W3C standards creation, today's announcement means that the specifications have gone through extensive community review and revision and that R2RML is now considered stable enough for wide-spread distribution in commodity software.
Richard Cyganiak, one of the Recommendation's editors, explained why R2RML is so important. "In the early days of the Semantic Web effort, we've tried to convert the whole world to RDF and OWL. This clearly hasn't worked. Most data lives in entrenched non-RDF systems, and that's not likely to change."
"That's why technologies that map existing data formats to RDF are so important," he continued. "R2RML builds a bridge between the vast amounts of existing data that lives in SQL databases and the SPARQL world. Having a standard for this makes SPARQL even more useful than it already is, because it can more easily access lots of valuable existing data. It also means that database-to-RDF middleware implementations can be more easily compared, which will create pressure on both open-source and commercial vendors, and will increase the level of play in the entire field."
According to Cyganiak, SPARQL is actually one of the most compelling reasons to use R2RML. "To me, SPARQL is the most practical and useful technology that has emerged from the Semantic Web effort so far. As a query language, it is simple, powerful, and with SPARQL 1.1 it now has most of the features that you'd expect from a business analytics language. But most of all, it shines when data from multiple source systems need to be combined to answer a single query."
Asked about the significance of this work, Eric Prud'hommeaux, a W3C staff member who contributed significantly to the work, added, "Relational databases have been entrusted with most of the data driving business and scientific processes. R2RML and the Direct Mapping offer a standard way, with simple tooling, to make that information available on the Semantic Web. R2RML is now a valuable piece of Semantic Web infrastructure, importing relational data into the Semantic Web."
R2RML has a companion specification that comes from the same RDB2RDF Working Group. That spec is called "Direct Mapping." Cyganiak gave us this insight: "I see the Direct Mapping more as a long-term bet, while R2RML is immediately useful. With R2RML, most of the complexity in mapping from the original database schema to the RDF vocabularies or OWL ontologies is expressed in SQL, and thus it should be easy to pick up for many. The Direct Mapping is a bet that we will write those mappings as RDF-to-RDF or OWL-to-OWL mappings in the future. There are lots of good approaches for writing RDF-to-RDF mappings, such as SPARQL CONSTRUCT, and rules languages like SWRL and W3C's RIF, but compared to SQL, very few know how to use them, and implementations that evaluate those mappings efficiently over relational databases are still scarce. In the short term, R2RML is easier to learn and more easily implemented."
For a good introduction to R2RML and Direct Mapping as well as a bit of history about these efforts, see Juan Sequeda's post from earlier this year.