Underway at the W3C are some major Semantic Web efforts that can have a big impact on the enterprise.
One of these is about mapping relational databases to RDF. The RDB2RDF Working Group late last month issued a last call regarding the publication of its Direct Mapping and R2RML documents. As has been noted by Tim Berners-Lee, a big driving force for the Semantic Web has been the expression on the Web of the vast amounts of relational database information in a way that can be processed by machines. “We know that something like 80 or 90 percent of data published on the web comes from relational databases,” says Ivan Herman, W3C Semantic Web activity lead. “So it is important to make smooth the bridge between those two worlds, and this is what the working group was set up to do.”
Getting to the point of a final check with the community on the working drafts has been a process that required coming to the conclusion that two approaches were indeed necessary to standardize upon. At a very high level, you can take a relational database and transform it all into RDF and put it into a triple store. In some cases, though, Herman notes, “this is not necessary or possible, and you want a view of the data and maybe you have a SPARQL query that is sort of converted on the fly to some sort of SQL query,” he says. “The idea is that both strategies are possible. [Which to employ] depends on efficiencies and data sizes and so on.”
The devil, though, is in the details, Herman explains. On the first point, mapping translates to getting an RDF graph which is a very close reflection of the schema in the database, which also means that this may not necessarily be the kinds of triples required for the application. For example, you may want to add data types to a number in a cell, “so you will have to run generated RDF graphs through some sort of transformation using SPARQL, rules, or whatever you love to use.” On the other hand, if you already have a whole mechanism of managing and transforming RDF graphs, this might be the easy answer. The other approach provides finer control from the start over what the final RDF graph is It lets the engine. “Essentially you want a separate processor which does all the transformation for you, and you describe what those transformations are,” Herman says.
The choice really depends on what environment an organization is operating in, and recognizing that, the group decided to define both mechanisms: Direct Mapping and R2RML. To the point of the usefulness of both approaches, Herman notes he has recently spoken with a company that has implemented both. “They actually had a very interesting experience because they were aiming to use something like R2RML because they wanted to generate an RDF graph of a certain type and they wanted a system to do it that way -- but for that they had to understand the schema of the database of their client.” The client was a big company, and like most big companies, the database was very mature and very big and very complex.
So much so that the company’s engineers, after spending a week trying to understand the client’s relational schema, recommended dropping the whole project because it would be too complicated to pursue. “However, instead of dropping it, they threw the Direct Mapping engine at it,” which created an RDF graph that directly reflected the schema, showing how it applied, how the different tables related and so on. “So suddenly in a day they could get an understanding of how the schema looked and so they took the project for the client,” Herman notes. From there they were able to optimize to the proejc using R2RML. That’s pretty powerful stuff, and, as Herman notes, dealing with the complexity of large and long-standing databases is probably a pretty typical enterprise scenario.
Another effort underway by the W3C with an enterprise focus is its upcoming workshop on Linked Data Enterprise Patterns. This is not only about the issue of relational databases, but considers in general how Linked Data should be or can be used within the enterprise. The W3C has issued a call for participation, due by Oct. 25, on topics that have resonance for the enterprise, such as access control, which aren’t always as well-considered by Linked Open Data initiatives.