An audience hungry for more knowledge about what Google is doing with structured data got its fill late in the day Tuesday at the Semantic Technology & Business Conference. Jason Douglas, Group Product Manager, Knowledge Graph, presented to the SRO crowd the link between semantic technology and structured data to the changing world of search.
“Search is changing dramatically because users’ lives are changing dramatically,” Douglas said. “We carry our computers with us. The Internet is with us all the time.” While a decade ago the library analogy worked for search, but increasingly users need more on-the-go, just-in-time information. So, today’s search analogy, he said, is more along the lines of the personal assistant – it’s more about giving users answers, and anticipating their needs.
The only way to meet those requirements is for search engines to have an understanding of the real world, to be the virtual equivalent of a real-world personal assistant who knows what and where things are and how they relate to each other. “On the answer side, if we actually know things about the world, it’s not just answering questions [themselves], but providing all kinds of context that helps in different ways,” Douglas said.
The Knowledge Graph underpins that understanding and provides the scaffolding for driving these search experiences, with entities that have unique identifiers that can be referred back to and to which more things can be attached, that extends in non-domain-specific and cross-domain-traversing ways its links and edges. “The interesting thing about the semantic approach is that it works in both directions,” he said, so that knowing that one node in the graph -- the band Daft Punk, he gave as an example – appears in the movie Tron means that we also know that the movie Tron features the band Daft Punk.
With the Knowledge Graph’s support, search queries can be seamlessly refined to help the user focus to a particular topic -- say, the San Francisco Giants -- for a more targeted experience from a wide-ranging query -- say, the more vague giants, which could be anything from the baseball team to a mythical creature -- while also proactively providing answers to the questions users are most likely going to want to know next, like the team’s scores, roster and so on. And, it then makes it possible for exploring what else is related to the query as it was refined, Douglas further explained – like the stadium the team plays in, which itself is related not only to other sports venues but also generally to other tourist destinations in the area like Fisherman’s Wharf. “That pretty fundamentally changes the experience. Showing related thigns can be helpful,” he said, “but to know they are similar you have to understand them as things,” not strings.
The Structured Data Road
Douglas toured attendees through Google’s structured data journey, from Rich Snippets. to the Knowledge Graph and its latest capabilities, like providing users answers not just about a country’s GDP but how that stacks up against the GDP of other countries that people most commonly compare it to (see this story), to venturing beyond search with structured data in Gmail, with support for JSON-LD markup (see story here). One scenario he depicted flowing from having a structured understanding of email was that a user could search the web, querying when is his flight leaving, and get the answer based on the fact that the reservation confirmation came in to that user’s Gmail account, marked up so that it was understood that the email was about a flight. “The concept of a more personal assistant starts to become a little more clear,” he said.
But in order to organize the world’s information, as is Google’s mission statement, so that people can make use of it, really high data quality is required from the start, Douglas emphasized. “It doesn’t take much of an error rate before you get a fundamentally bizarre user experience,” he said. “For the Knowledge Graph itself, if you are using it as scaffolding, as the thing you are building experiences around, I think data quality becomes a really big deal.” In a follow-up discussion with the Semantic Web Blog, he discussed how FreeBase, one of the sources for Knowledge Graph data with some 40 million entities, has community processes that help with ensuring data quality, and that there are other ways that Google can get some understanding of the quality of data that may be utilized from individual or aggregated sources. Google Places for Business is an example, and specifically a product called Business Builder that business owners can register with for verification of their company and its data, so that information they supply has greater trust.
“Whatever it takes, as long as we have some understanding of quality,” he said. Also necessary to the usefulness of the Knowledge Graph to the user experience is a low duplication node rate, an obvious data quality issue but also a multiplicative one, which is an even bigger deal, he noted.
Google Goes On
With some 570 million entities – and a lot of facts about them to date – there’s still a lot Google wants to do concerning the Knowledge Graph. A lot of the basics are still to do, Douglas told the Semantic Web Blog after his presentation, “in terms of understanding what users want, what they need and what questions they are asking, and then serving the right experience with what we have. There are obvious ways we want to grow the Knowledge Graph but we are not done in making the search experience better” with what it now is.
Douglas discussed with The Semantic Web Blog how the Knowledge Graph knows a lot about canonical entities, like famous people, well-known places, and countries, but noted that an aim is “to become more familiar with the entities in everybody’s life – even more local coverage like sports games and events – with a lot of things that may be more transient but still important. Those are a lot of the kinds of things we are adding now.” Some information that appears in Knowledge Graph results today, including local event listings for bands, organization logos or ratings for movies, can get there from web sites implementing structured markup for the information. During the presentation, he noted that for a search on the movie The Hudsucker Proxy, the Knowledge Graph panel can show what Google knows about “which reviews are actually about a certain movie and put it in context as a user makes a decision about [the film].”
A path to structured markup is via schema.org, which Douglas described as “really trying to make the markup of existing data as simple as possible and lower the friction in every way there.” In contrast, The Knowledge Graph, on the other hand, is trying to capture heterogeneous data at really high data fidelity and generally requires a more complicated and abstract data model, one that he said is too “ridiculously complicated a data model for someone who’s marking up a web page… In some ways it is just easier to deal with the mapping or rolling up … than it is to deal with bad data.” Regarding The Knowledge Graph’s consideration of structured markup like schema.org, he told The Semantic Web Blog, “Schema mapping is a much simpler problem to solve than data quality, which is why optimization for ease of implementation is pretty important.” Douglas also told the crowd, in response to a query about whether Google would trust people to use Knowledge Graph IDs to annotate their own content, that he has been “pushing for a sameAs property in schema.org for some time for that purpose.”
How eagerly is the web community embracing structured markup? “We follow users and what users want,” he told The Semantic Web Blog. “If I could characterize what I’ve seen in terms of the adoption patterns of structured markup… if there are users for features that involve structured markup, people will do it. We are trying to make it easier. But still at the end of the day, if users are there, people will do the markup. In some ways it is sharing the burden. Do more and better features that do the stuff that users like, and then the whole ecosystem will work.”