Schema.org, Wikidata, Google Knowledge Graph – Two Great Causes and a Symptom

I was toying with another title for this post – Yet Another Perfect Storm, but I think that particular metaphor (although appropriate here) has been somewhat over done.  So what sparked this one then?

I am on the long flight back from the Semantic Tech & Business Conference in San Francisco to the good ol’ UK, to see how they got on with the Queen’s Diamond Jubilee festivities.  I am reflecting on what my week at the conference has told me.  It has told me that things are a changing – I got that impression last year too, but more so this year.  Obviously, from the title of this post, it has something to do with Schema.org, Wikidata, and the Google Knowledge Graph….

schema-org1 Schema.org, as I posted on the day, was represented by a panel of interweb luminaries from the likes of the W3C, Google, Yahoo!, Microsoft, Yandex, New York Times, and Disney.  The session had the air of something that had been around for much more than the 12 months since its announcement.  A few things worth noting here include the acknowledgement of RDFa as equally, if not more, important as Microdata as a method of embedding Schema.org structured data in web pages; Microsoft using schema.org mark-up as a way to pass structured data between applications in Windows 8; GoodRelations, and rNews from the IPTC, are leading a growing number of extensions now in the pipeline to broaden the scope and depth of the ontology; the introduction of External Enumerations to enable the referencing and inclusion of authoritative lists of things such as place names, currencies, and subjects in schema.org mark-up; the proportion of crawled pages already containing schema.org mark-up has risen to 7-10% – that’s a heck of a lot of pages!.

wikidata Wikidata, launched at the sister Semantic Technology & Business conference in Berlin in February, confirming its goals to bring consistency and reusability to the masses of data in Wikipedia.  Building on the successes of the Wikimedia Commons and Dbpedia, Wikidata will centralise the maintenance and publishing of facts from the many language versions of Wikipedia.  As images now are [drawn from Wikimedia Commons] the info-boxes in all Wikipedia will be populated from Wikidata. Avoiding concerns about suppressing diversity, the intention is to reference supplied facts not create a single truth.  In a year or so’s time, when the URIs for this data start to be published, I predict that it will soon overtake Dbpedia’s place at the centre of the linked open data cloud.  Not a very risky prediction.  The data should be even more comprehensive and reliable as those URIs will reference the source data, not as it is today: a scrape and reformat of info-box mark-up.  As an added consequence of the decision to reference data sources, Wikidata should be able to reference sources not produced inside Wikipedia, library catalogues for instance.

jupiter - Google Search Google Knowledge Graph is an impressive experiment on the Google search results page.  They are identifying concepts in search terms by matching them with their vast structured view of the world, populated from Freebase, search logs, and harvested structured data from crawled pages (much in Schema.org form).  Once identified they then display the structured data, screen right, to add value to the user experience by supplementing search results with facts.  An impressive demonstration of what can be done when you have structured data about ‘things’.  Like many technological advancements, that impressiveness is lost on the those who are not geeks like myself.  I can categorise the responses I have received to my excited explanation of the feature to friends and family as follows: “Kinda nice, but shouldn’t it work like that anyway?”.

So what are the forces whisking up this perfect storm I am referring to?

The broadening and deepening of the schema.org ontology, coupled with the search engine companies’ payback for embedding structured data – better [Rich Snippets enhanced] listings, will encourage the percentage of marked-up pages to grow rapidly from that current 7-10% figure.  The linking out to externally enumerated authoritative lists of concepts will not only make it easier to identify entities on the web, but also how they are related.  The establishing of a comprehensive Linked Data set of concepts, efficiently managed by and within the Wikipedia community, and linked external sources – Wikidata – will encourage even more to link to it than Dbpedia has so far, thus dramatically increasing the pool of authority to add to that structured data in schema.org form.

These developments will almost certainly feed off each other – a demand for more structured data from the SEO community, broadening the coverage and adoption of schema.org – the drive for increasingly authoritative structured data encouraging links to authoritative sources such as GeoNames, Media, Publishers, Libraries, Governments, and Dbpeadia [to be be eventually superseded by Wikidata] – the publishing of these sources in search-engine-friendly formats (it is a brave person who would bet against Wikidata outputting schema.org data) – search engines being enabled by a new prevalence of structure will be enabled to produce better targeted end-user experiences, related to the facts and concepts identified in the discovery process, stimulating the emergence of new products, services and business models.

Wikidata and Schema.org are great causes in both senses of the word – something to support and engage with as they make the web a better place for data, and causes that will stimulate change. Google Knowledge Graph is an early, but obvious, symptom of that change taking place. No doubt these three will be joined over the coming months with new sources of data or demonstrations of the benefits.

For now try searching for something that triggers the appearance of the Knowledge Graph on google.com (try Jupiter, Lion, or Bush which provides a choice of ‘things’), follow a few links from that right hand panel and then lean back and imagine where this might take us…..

Richard Wallis is Technology Evangelist at OCLC and Founder of Data Liberate.

