Schema.org Workshop – A Path Forward

By   /  September 22, 2011  /  No Comments

photo of schema-org leadership panel at workshop

schema.org Leadership Panel; L-to-R: Michael O'Connor (Microsoft), John Giannandrea (Google), Charlie Jiang (Microsoft), Kavi Goel (Google), R.V. Guha (Google), Steve MacBeth (Microsoft), Gaurav Mishra (Yahoo), Peter Mika (Yahoo)

A room full of interested parties gathered in Microsoft’s Silicon Valley Campus yesterday to discuss Schema.org, its implications on existing vocabularies, syntaxes, and projects, and how best to move forward with what has admittedly been a bumpy road.

Schema.org, you may recall, is the vocabulary for structured data markup that was released by Google, Microsoft, and Bing on June 2 of this year.  The schema.org website states, “A shared markup vocabulary makes easier for webmasters to decide on a markup schema and get the maximum benefit for their efforts. So, in the spirit of sitemaps.org, Bing, Google and Yahoo! have come together to provide a shared collection of schemas that webmasters can use.”  (For more history about the roll-out and initial reactions to it, here’s a summary.)

Yesterday was the first time since the Semantic Technology & Business Conference in San Francisco that community members have gathered face-to-face to discuss Schema.org in an open forum. It was a full agenda with plenty of opportunity for debate and discussion.

The day started with a welcome from Ramanathan V. Guha, Google Fellow and the de-facto emcee for the day’s events. Guha gave a bit of history, stating that it was about one year ago that the “big three” search engines started discussing schema.org.  He addressed the concern that the launch of schema.org was not communicated perfectly and reiterated the desire of schema.org’s leadership to collaborate with and get feedback from the community.

Not surprisingly, there was a lot of discussion during the day about the virtues and shortfalls of various syntaxes (RDFa, Microdata, and Microformats) but there seemed to be some general themes and wide agreement around a few key ideas.

1. Simplicity is important. I think what most people were getting at here was that to gain wide adoption, semantic markup needs to be easy to implement. This discussion mostly focused on the cognitive model – how developers can wrap their brains around the data they’re trying to model.

2. It is important for the search engines to support multiple syntaxes. There are pros and cons to each, and while someday, a single syntax may emerge as the *best* one to use, for the time being, that is not the case — there are solid implementations of each in existence today.

3. Multiple Types must be supported. When modeling data, developers need to have the flexibility to make statements including multiple types.

4. Extensibility is important. Community members reiterated the importance of being able to point to other vocabularies.

5. Open communication needs to continue. Guha and other schema.org leaders said multiple times that they truly want to hear from members of the community – particularly publishers and developers – about what they need to see as schema.org advances.  Also on the theme of cooperation and collaboration, the W3C announced this week the creation of two new Interest Groups, one of which will be led by Guha (link and further details below).

There were also some exciting announcements of schema.org collaborations sprinkled throughout the day.

  • Ben Adida proposed RDF 1.1 Lite, a simpler version of RDFa.
  • Evan Sandhaus (New York Times) and Andreas Gebhard (Getty Images) announced that about 96% of rNews properties have been aligned with schema.org. The team is “working on the rest.”
  • Greg Grossmeier (Creative Commons) announced the Learning Resource Metadata Initiative (LRMI).  A joint effort of the Association of Educational Publishers (AEP) and Creative Commons (CC), LRMI “is creating an industry-specific framework to make quality educational resources easily searchable for teachers and learners.”
  • Martin Hepp, creator of the widely-used GoodRelations vocabulary, talked about his efforts to align GoodRelations with Schema.org. He has experimented with RDFa and Microdata implementations and has found success with Microdata.
  • Guha said that the US CTO, Aneesh Chopra, is working with the Schema.org team on issues of information transparency.
  • Another good sign of collaboration and cooperation was announced earlier this week in the form of two new W3C Interest Groups: the Web Schemas Task Force, to be chaired by R.V. Guha himself, and the HTML Data Task Force, to be chaired by Jeni Tennison.

In general, the conversation seemed to provide some good starting places for the various stakeholders in the Semantic Web community to work together.

