An informal raise-your-hand survey of attendees at the SemTech conference in San Francisco this week revealed that a good number of attendees were here for the first time. And one of the early morning tutorials Monday provided a perfect opportunity for many of them to explore the Semantic Web in greater depth, with the W3C’s Semantic Web Activity Lead Ivan Herman Introduction to the Semantic Web session.
During the session Herman explained the various components of the Semantic Web and how they fit together. He started with defining RDF (Resource Description Framework) as the basis for it all, serving as a general model for the triples – the subject-property-object sets — forming a directed, labeled graph, where labels are identified by URIs (Uniform Resource Indicators). And working from there all the way through to OWL and RIF (Rule Interchange Format).
Some highlights of the journey follow:
- Everything in RDF is a resource, including classes that also are a collection of possible resources, and it is the RDF Schema (RDFS) that formalizes the notion of classes and subclasses. “The RDFs defines all the types of URIS—types, class and subclass — and defines how you would use them in a system,” he explained, such as what restrictions apply and what relationships do or can implicitly exist among the resources.
- Options for publishing RDF data range from the not-so-practical, like typing it into an editor in some sort of serialization, which doesn’t scale, to adding RDF data to HTML through microformats, microdata, or RDFa ,which is a complete serialization of RDF. Today, content management systems such as Drupal, or plug-ins to them, can themselves generate pages with RDFa.
- “The other huge source of information are relational databases,” he noted. “That is where most of the data resides and somehow you want to get aces to that data.” A simple export solution is RDF Direct Mapping, a specification that should be finalized by 2012 for providing a standard way to auto-generate data that is in a relational database into RDF. It’s a quick way of getting to RDF data, but what it generates is fairly simple and so somewhat far from the kind of URIs you want at the end of the process. “So there is the necessity for an extra step to transform the output RDF into something closer to the vocabulary you use,” he said. “But for getting into the RDF world as soon as possible, you can do that with RDF Direct Mapping.”
- Also in a working group now and hopefully standardized next year is R2RML, a separate vocabulary that provides finer control over mapping. “It can be more complicated but it gives you more control over the structure of the resultant graph,” he said. “This generates RDF that is directly usable for your application, with the right vocabulary, with the right voice.”
- SPARQL, for complex querying of RDF data (many of the Linked Open Data sets on the web have SPARQL endpoints), is to see a significant update in V. 1.1 that is formally due later this year or next. With that version, you can modify the original dataset, such as adding another type of triple to it. “This is a very powerful thing because here suddenly you don’t just query data but model and manipulate it,” he said. But take care, as there is no security mechanism built into SPARQL per se: “There are issues about access control so you have to be careful about implementations … I don’t hide that the whole issue of access control with RDF is one of the areas where we need extra work. It is not solved,” Herman said.
- SKOS is good for providing simple vocabularies, but OWL (the Web Ontology Language) is needed when an application wants more, whether that is the characterization of properties or equivalence or disjointedness of classes or constructing more complex classification shemes. He told the group that OWL is something they likely have a vision of as being horribly complicated. “And some of it is complicated,” he said, but OWL 2, which is the current standard, has done a better job of showing it is possible to use subsets of OWL such as RL (Rule Language)(rather than the whole enchilada) to build relatively simple ontologies. Oracle is planning to develop an RL engine as part of its environment, for instance, and Ontotext has that as part of its offering already.
- By the way, the standards are set so that there is room to grow. “All the RDF schema terms are included in OWL, so it’s okay to start in one and move to another without disrupting a project that starts small and needs to add new things,” he said. “You won’t destroy what you already have.”
- A simple rules language, RIF can also be an alternative to what OWL can provide. “Very often it is question of your personal experience and affinity,” when it comes to choosing using rules vs. ontologies, he said. “Some people are more comfortable describing things in a classification environment, and that is OWL, while those from the traditional programming world might feel more comfortable by defining rules and using those.”