What he will say is that there’s a reason such talk gets started around intelligent, personalized news apps. “Whether a company chooses to partner with a larger company to distribute their technology or go on their own, we are solving a basic consumer need,” Johnson says. The iPad-oriented Zite, which has its semantic groundings in the Worio (the catchier handle for Web of Research Iteration One) contextual discovery engine plug-in that works alongside a user’s search engine, benefits from the six years of work behind that system in personalizing search results, he says.
And it takes the concept to today’s problem of helping users deal with the non-ending stream of news from a seemingly endless array of providers. “There is a lot more news being produced on the web, and a number of technologies people thought would help users filter it aren’t working,” he says. RSS feeds, for example, were useful some years ago, but today you go to your Google Reader, step away for a couple of phone calls and lunch, and come back to a few hundred more unread articles. “Which of those 30 do you really want? And the third trend is that you box yourself only into the things you know about,” Johnson says.
It’s come to a situation where you try to solve the problem by following tons of RSS feeds or Twitter users and then wind up with too much noise, or keep tabs only on a select segment but then risk not getting enough interesting content. “There’s a lot of information but no good way to organize it, and also you don’t want to feel constrained by choices you’ve made,” he says. “What we love to hear from customers is that they read stuff on Zite they could never find anywhere else. It’s exciting for us that we found content for them that is unique, interesting and highly relevant to them.”
Zite’s algorithms hit at the issue by analyzing and classifying web content and using signals from the social web to help predict what any one individual would like to read. It looks at documents a reader has liked (there are like and equally valuable dislike options) and at documents other users like them have liked -- the theory being that if there are other users that like documents you like, you probably will share interests in other content. Think of it as a massive collaborative filter that can help expand a user’s horizons.
“And the more social users that are out there the better that will be,” Johnson says.
The system takes a broad swipe at identifying the content a reader might like, considering not just that someone might prefer certain authors but also their style preferences, such as opinion pieces vs. straight reporting, or even preferred length of stories. “So we glean what is in the document and what people are saying about it, and that leads to a really powerful discovery engine,” he says.
The social angle here – the articles that get buzz, including from someone on Twitter you don’t know but who turns out to be a good predictor of the kinds of documents you would like – is important, but not absent analyzing actual content’s aboutness. As Johnson points out, there’s actually a data sparsity problem when it comes to content judgments in the social sphere, with most pieces of web content having about only five to ten interactions on them.
“So you need to look into content to see what is similar across articles,” he says. “Most other systems have dozens or hundreds of categories, but we look at millions of articles a day and have over 1,000 different categories to choose from.” That breadth lets the technology drive down to the tails of peoples’ interests to find stories they wouldn’t have thought about but are in their realms of interest, he says. “You can only do that when you have a lot of articles coming through every day.”
Which is why, by the way, he doesn’t think anyone should have concerns that any acquisitions of personalized news readers by content providers that may occur in this space should lead to any concerns about the content options with which a reader will be presented. Taking Zite, just as an example of course, “we recommend hundreds of thousands of sources every day to people. That makes us really unique in the marketplace,” he says. “We think that is incredibly important to our product and a differentiator. And we see discovery engines putting in more content from more people rather than less.
It is no help to have a discovery engine hampered by limited content. The Web is free and a lot of stuff going on out there, and people will find it one way or another. We use our algorithms to figure out the best, most interesting and personalized content out there, and to be agnostic as to where that content comes from.”
So, while talk swirls around what company may or may not get involved with Zite, Johnson says it's staying focused on delivering what its very vocal user base wants. That includes improving the reading experience on the iPad and making Zite portable. And, of course, continuing to work on its core recommendations capabilities.
“We are a search and discovery company and that will always be the focus,” he says, meaning it will stay up to date with new signals from the web to improve the dialogue with its user. Recently it added the ReadItLater document corpus, for example, and it wants to continue on the track of leveraging those. “Any time a user has a corpus of documents we can use to hone their recommendations, that is really useful for us,” he says.