Sing along with me to this classic hit from 1980: “Knowledge graphs are everywhere; They’re everywhere; My mind describes them to me.”
Our Daughter’s Wedding’s song Lawn Chairs. But it’s a good description of some of the activity at the Semantic Technology & Business Conference this week, which saw Google, Yahoo and Wikidata chatting up the topic of Knowledge Graphs. On Tuesday, for example, Google’s Jason Douglas provided insight into how the search giant’s Knowledge Graph is critical to meeting a new world of search requirements that’s focused on providing answers and acting in an anticipatory way (see story here), while Wednesday’s closing keynote had Wikimedia Deutschland e.V. project director Denny Vrandecic getting the audience up to date with Wikidata – aka, Wikipedia’s Knowledge Graph For, And By, Everyone.
There are some 280 language versions of Wikipedia for which Wikidata serves as the common source of structured data. Wikidata now has an entity base of more than 12 million items that represent the topics of Wikipedia articles, Vrandecic said during his presentation.
As an example, Vrandecic showed the audience the structured information for the city of San Francisco, where the conference was held, in Wikidata (thanks to a little item disambiguation so as not to wind up at the same-named municipality and town in Cundinamarca, Columbia or the 1977 Village People song).
This data is a big help in populating Wikipedia info boxes. “Those info boxes used to be completely created in the Wikipedia page itself, so there were different info boxes for every language edition,” he said. “Now with the Knowledge Graph of entities represented, we can connect those entities to each other [such as this city is the capital of that state]…There is one place where the information is.” (One thing to keep in mind, he noted, is that Wikidata isn’t about the truth, so to speak. There may, after all, be disagreement between different communities about what the population of Jerusalem actually is, for instance. Wikidata does, however, collect references and sources for data items so that each community can decide what to trust for its Wikipedia pages.)
That structured data is a big help for increasing the quality of Wikipedias and lowering maintenance costs, but its machine-readable knowledge database of items also is available for anyone to freely use. “A goal of Wikidata structured data is connecting items inside Wikidata that everyone can access and query,” he said. All the data inside Wikidata is completely available, via REST APIs, data dumps, and as Linked Open Data in RDF. “We provide you with a Knowledge Graph that you can use yourself in your own applications. You can use this to give your applications some more intelligence and smarts.”
Vrandecic shared with the audience some apps that already have started to leverage its Knowledge Graph data. That includes GeneaWiki, a graph viewer showing genealogy information in WikiData, and the Tree of Life. “What is here is to take the taxonomical relationships [of all life] inside Wikidata and use that as an alternative way to browse Wikipedia,” he said.
Another one is a student-created app, The Wiki Atlas. It gives a map of India, with labels that come from Wikidat, and city names can switch the languages, from English to German or any other. “The student who created the app doesn’t speak all those languages but Wikidata does,” Vrandecic said. “It is the completely same data set. You get the labels for free from Wikidata and display them.”
He concluded with a summary of what Wikidata wants to do and be: We are, Vrandecic said, “building an editable common resource for data for everyone to use on the web. The data is freely reusable and machine readable to enrich applications and make them smarter, and the Knowledge Graph is for free….We hope to get one step closer to providing everyone in the world with access to all our knowledge.”
And it’s getting a hand in doing that, he revealed, with a $200,000 donation from Russian search engine Yandex to support the Wikidata project.
Yahoo Takes Its Knowledge Graph For a Spin
Yahoo’s view of its Knowledge Graph shares with Google the goal to leverage structured data to deliver answers to users as their expectations for search evolve, and also, it provides data APIs and offers a system to generate custom data packs to make its information directly accessible to consumers within Yahoo. As it happens, on the search evolution front, Yahoo this week also redesigned its search results page to put those results higher on the page.
It’s all about getting users what they need faster. Indeed, during his presentation on the Y! Knowledge Base at the conference, Yahoo Inc. principal research engineer Nicolas Torzec discussed the company’s focus on using its graph of entities and the relations between them to provide answers, not just links, to empower user experiences. That’s particularly so in context with the topics that matter most to Yahoo and its audience -- news, finance, sports, movies, TV, music and geo-domains, as well as some important points of interest. Leveraging its Knowledge Graph to repurpose content and personalize experiences is key, too.
“We are shallow and weakly typed for a wide variety of domains and richer and strongly typed for a few selected domains,” he said. There currently are 10 million entities, 10 million relationships and 30 million properties in its Knowledge Graph, and growing steadily. “It’s the central or unified knowledge repository to find all the key information for the entities we care about,” said Torzec. As a centralized and unified graph of all entities and topics in which Yahoo is invested, the Knowedge Graph “needs to provide agility for user experiences.”
Do a search for the movie World War Z , for example, and you’ll get information about the movie – a trailer, video results, opening date – and a selection of relations to it, so that you can explore more about people involved with the flick, like Brad Pitt and James Badge Dale. “Most times you’d have to go to many different places for the information, but if the Knowledge Base is where information is aggregated and unified, things are simpler,” he said.
Torzec summed up the platform as having three main components. “It all starts with knowledge acquisition. Then you aggregate and combine that to the knowledge base, and then there is knowledge consumption so you can do something with that data,” he said. The work involves collecting, extracting and mining information about entities from multiple data sources – it uses general sources like Freebase and Wikipedia and more specific ones for its key domains, using both the DBpedia extraction framework and more advanced extraction and cleanup capabilities. It interlinks across objects and data sources and performs type inferencing and the compulsory entity disambiguation. So, when you’ve clicked through from World War Z results to Brad Pitt, you’re getting fed info about Brad Pitt the actor, not the boxer. And so that the machine knows they’re both people in the first place.
The entities in the knowledge base share a common ontology for Yahoo’s key categories and its schema is aligned with schema.org. “The ontology was developed at and is only available in Yahoo,” he noted. “A rich, unified Knowledge Graph is the goal. So when we ingest data, from whatever source, we align the entities and relationships with a common ontology. From there we map to the same format and same semantics, so information is mapped and normalized to standard schemas,” he said.
Yahoo is using some RDF, he said, “but the graph itself is not a triple store.” The audience got to see how knowledge domains unfurl across the graph – relationships that extend from Brad Pitt to his Fight Club movie to other movies he’s starred in, including Oceans 12, to his co-star in that flick, George Clooney, who also starred in the TV show ER, which takes place in Chicago, which is a city where the Chicago Cubs play baseball, which is a sport, and through to sources for tickets. Whew. “So now you are entering the domain of a business listing,” he said.
Torzec said that just retrieving the information you need and focusing the data as you need to was the driving idea behind its knowledge base at Yahoo. These goals existed at Yahoo for a long time, he noted, but often were implemented independently and separately. The idea with the Knowledge Base isn’t to replace deep categorical repositories already developed in their respective places, but to leverage that with its lightweight, centralized approach.
“On top of the Knowledge Base we can do data exploration and editorial curation,” he told attendees. Among what the latter provides, it “lets editors create, delete, merge or split entities and relations, and review, update delete and enrich information about entities.” For example, if Wikipedia updates Brad Pitt’s first name, editors could directly change the value in the Knowledge Base. “That’s faster to the consumer,” Torzec said.