A packed room at the Semantic Tech & Business Conference in San Francisco played host to the much-anticipated Schema.org panel on Wednesday morning. As W3C semantic activity lead and moderator Ivan Herman had hoped (see this article), the discussion didn’t get bogged down in a duel between RDFa and microdata, but rather emphasized some important accomplishments of the last year and looked forward to future work.
As Herman put it, the only discussion he wanted to have around RDFa was to announce that the proposed RDFa 1.1 recommendations are expected to be published as official W3C standards Thursday, and that there had been a lot of interaction with the schema.org folks to make this useable for them as well.
Wednesday’s panel was composed of: Dan Brickley, of Schema.org at Google; R.V. Guha of Google; Steve Macbeth of Microsoft; Peter Mika ofYahoo!; Jeffrey W. Preston of Disney Interactive Media Group; Evan Sandhaus of The New York Times Company; and Alexander Shubin of Yandex.
Here are highlights of what took place:
1. It’s a more open Schema.org. “A year ago,” Guha said, “people were thinking that the major search engines were hijacking semantic web activity… and we’ve graduated to a whole lot of open activity taking place on W3C.” Brickley added that he was really happy to have “moved things loosely under the W3C umbrella,” noting that in effect schema.org is a W3C community in all but name, and that specific vocabularies could even be the fruit of working group efforts. Most of the energy and important discussions, he says, are taking place out there on the public lists.
2. More on the open front. Brickley acknowledged that at launch, the schema.org language “did look monolithic, exclusive. That was really unfortunate and we have moved away from that message,” he said. Tangible evidence comes, for instance, in events like the one a couple of weeks back, when schema.org announced support for enumerated lists, so that developers could use schema.org to use selected, externally maintained vocabularies in their schema.org markup.
External enumerations were talked about on the panel today: “Absolutely schema.org doesn’t care what you put on your site. Be as rich and descriptive as possible. And, our technical choices like external enumerations let all the richness you have on your own site be machine readable.”
3. Schema.org scouts out a role in interoperability. Macbeth says one of the most interesting things for Microsoft is that it sees schema.org’s applicability for data interoperability. The upcoming Windows 8 is supporting schema.org for moving semantic data between applications, for example. A lot of the investment the vendor has made over the last year has been aimed at bringing “the world of structured data and the web corpus closer together,” he said. “Schema org gives us tools to do that in the most scalable way.”
4. The benefits are becoming clearer to adopters. Of which, by the way, there seems to be panelist agreement that from 7 to 10 percent of web sites are using the markup. One of them is Disney Interactive Media Group: Preston said that, once it started wrapping videos with schema.org video markup, surfing experiences for kids seemed to improve. “It was a better guest experience for people using our site,” he says. “That was the biggest benefit so far.”
Guha also pointed to work on a project with the Department of Veterans’ Affairs to create a job search engine, leveraging schema.org markup that can be inserted by potential employers into job listings to show that the positions are veteran-friendly. “The idea,” he said, “is that we will not just make search results in existing search engines better, but hopefully create a new class of search and structured data apps on the web.”
5. From more seamless search to less IT burdens.. At the NY Times, Sandhaus said he thought it was still early days to quantify benefits. But two developments stand out: One is that Schema.org’s addition of an alternative headline property “now gives the web data ecosystem access to an important bit of data that it didn’t have before to improve the search experience for NY Times and other print readers,” whose publications may not use the same headlines for articles on the web as they do in print.
The second is that the Times’ integration with the News Right rights clearance service is its first example of what it expects will be a less burdensome way to integrate with other vendors. “Schema.org markup on the page gives vendors a new way of integrating with news content without having to give them special access,” he said.
There’s still a lot of making things up as schema.org goes along, the panelists indicated. “No one has ever tried to scale a schema this big and widely-used through this kind of process,” Brickley said. Said Guha, noting efforts underway for vocabularies in the health and geneology areas, “We do expect the dominant way of picking up new vocabularies will be partnering with organizations that know their topics very well.”
The panelists wanted to respond to concerns that corporate special interests are running the schema.org show, or that it’s the death knell of the semantic web as the idea of anyone saying anything in any way they like. New vocabulary proposals are always welcome on the public mailing list, said Mika and Shubin, and Macbeth noted there’s no weight given to who makes the suggestion, just to what it is.
“The semantic web is not gone,” said Guha. “It’s really important to understand we are not telling the world to use this vocabulary and this alone. We are just saying this is a vocabulary that we understand and we think we provide a service to the webmaster community…and if you look at the rise in adoption over the last 12 months, [it seems there is] some kind of related need for this level of clarity.”
And so it moves forward. Brickley said that just a couple of days ago, for example, he floated the idea for a schema.org SameThingAs property, because the fun part of inferencing lies in being able to do things like as a step to having some fun with inferencing.
Of Google’s own plans, said Guha, “We are making a very big investment in structured data…. Structured data is beginning to permeate the very essence of our index and the way we think about the web.”