Spivack’s Bottleno.se Built To Match Scale of Exploding Message Stream

By   /  April 8, 2011  /  No Comments

The Twitter stream, says Nova Spivack, will be a victim of its own success unless we come up with new ways to filter and make sense of it. So that’s what he’s doing with his latest venture, Bottleno.se (which you’ve probably heard about in that very stream over the last few days).

Bottleno.se is diving in where Twitter Annotations ultimately didn’t tread. Named for our smart mammalian cousin with its extraordinary sonar system, Bottleno.se is setting out to help the more-power Twitter users (other social networks, like Facebook, will be wrangled too) navigate the exploding stream in real time.

“There are 150 million messages a day now,” says Spivack, whose other ventures include Klout, LiveMatrix and of course, Twine. And he’s confident that the growth experienced so far, some three times as many Twitter messages as last year, is going to be exponential, not linear. “Next year there will be more than 3 times – four times or four, five, or six times as many messages,” he predicts, with not just people but apps automatically tweeting too.

A world of accelerating streams is one where even more messages that you want to see – and more that you want to be seen – get missed. But for Bottleno.se, Spivack says, the more noise the better. “We use that noise, analyse it, and learn about each person on Twitter – each user name, each company brand – what they’re interested in, what they talk about, what they share and like, and who they are connected to, so we build a pretty rich attention profile and from that we also are then able to calculate in real time the relevance of every message to them and how important it is for them,” he says. “So we’re doing real-time personalization so we can give people a feed of the messages they really need to read.”

The semantics enter by way of built-from-the-ground-up NLP system designed specifically for handling short messages of the kind you see on Twitter and Facebook, and all their peculiarities (bad grammar, odd spellings, etc.). “A standard NLP system designed for huge documents doesn’t work for this,” Spivack says. The new one it’s written also was written in a new way, he says, to support massive scalability. It runs in the cloud and today it already processes eight times as many messages per second as flow through Twitter on average.

And that’s just the beginning, given the upward trajectory of social media messaging. “We can expand the scale and capability of this by many orders of magnitude with no problem at all,” he promises. “We can handle thousands of times the number of messages that Twitter is processing right now,” automatically annotating them semantically with all kinds of metadata and using that to filter the stream, Spivack says. “So at the core is a powerful semantic personalization system optimized for real time,” and around that are capabilities for learning what users’ interests are, and what they like based on their feedback or actions they take, contribution to what he calls the “precision of the personalization.”

Its combination of NLP as well as structured semantics and automation in the service of personalization is a departure from systems that some have compared Bottleno.se to, like DataSift. The latter, Spivack says, looks at the Twitter firehose and computes general stats about messages. “Those things are great and they are things we can use. But we do something different that requires different scaling,” he says. “We are computing everything relative to every person. We have to look at every message for every person.”

While Spivack wouldn’t disclose specific details of how the new type of distributed cloud computing that is behind Bottleno.se scales up and speeds processing, what he can say is: “The bottom line is that it’s very easy for us to scale and at a very low cost. What this means is we can auto-classify all your messages, what they’re about and how interesting they are to you and do it in real time.”

About Those Semantics

But let’s get back to those semantics. Bottleno.se is semantic, but it is not a Semantic Web system. (Neither, for instance, is LiveMatrix.)  RDF, OWL, SPARQL don’t show up here. To be clear, Spivack remains a fan of the Semantic Web standards and the vision that led him to champion Twine, but he’s got to be pragmatic, too. Time at Twine taught him a lot, he says. “The open standards of the Semantic Web are specifically useful in my opinion for two things: One is for reasoning, enabling very advanced machine reasoning, and two is for data portability, linked open data.” The apps work he’s done since Twine hasn’t required particularly high levels of reasoning around asking questions, disambiguating information, following logical queries and so on that OWL accommodates. “The systems that do that today are slow and clunky and don’t scale to the kinds of data sets we do with – they’re not good for 150 million messages a day,” he adds.

And RDF and company are valuable when trying to make an open data set or mash sets up into some type of database application, but that’s not a compelling business case for Bottleno.se now. “If there is a period in the future where we need to provide some kind of open data set from our stuff, RDF might be a very god choice for how to make that available, but it can’t be a religious choice,” he says. “We have to look at the problem specifically and ask how does RDF or XML or anything else map to it. We’d consider it as a tool in our toolbox, but right now we’re not focused on that problem.” Twitter owns its data set anyway, so Bottleno.se wouldn’t be in a position of releasing it as such.

So, Bottleno.se is like Twine only in that it is trying to make sense of information, not in what technologies or techniques it embraces to get there or even necessarily in the content it’s looking at. That said, Spivack would like those who were heavy Twine users to know that he thinks they’ll find Bottleno.se to be interesting and relevant to their needs, and, given their self-selection as individuals interested in information curation, knowledge sharing, and making the web smarter, and he’d be interested in including them in the Bottleno.se closed beta that starts in June. Others more likely to get on board include social network influencers. In fact, there’s a bit of a viral game going on here that can test how much of an influencer you are – if you get a lot of other people to sign up, you rise higher in the rankings and raise your own odds of getting in. An open beta will follow the closed beta period.

What else to expect from Bottleno.se? There’s a full framework behind Bottleno.se, which is both application and platform, and so is extensible for writing plug-ins and full apps on top of it. The visual client interface is the showcase and aimed primarily at the power user level. That is, those individuals with large followings or who are heavy followers, power bloggers, marketers, brands, and so on – basically, those who are very connected, highly socially networked, creating and consuming lots of information system and who would most benefit from Bottleno’se’s built-in automated curation and organization to help with their social messages. Organize everything you’ve ever tweeted. (Hint: If you’re ok just using Twitter.com, this probably isn’t for you.)  Along the way, they can help the process by tagging things semantically themselves if they choose. “That little bit that people add is often among the things that are hardest to catch. From there we learn new rules and patterns,” he says.

There will be both a freemium and pro version of Bottleno.se, which Spivack sees as a natural fit with what his interests always have been. Says Spivack, “The theme in my work has been around using big data, analytics, natural language, and semantics to try to understand the web.”

About the author

Jennifer Zaino is a New York-based freelance writer specializing in business and technology journalism. She has been an executive editor at leading technology publications, including InformationWeek, where she spearheaded an award-winning news section, and Network Computing, where she helped develop online content strategies including review exclusives and analyst reports. Her freelance credentials include being a regular contributor of original content to The Semantic Web Blog; acting as a contributing writer to RFID Journal; and serving as executive editor at the Smart Architect Smart Enterprise Exchange group. Her work also has appeared in publications and on web sites including EdTech (K-12 and Higher Ed), Ingram Micro Channel Advisor, The CMO Site, and Federal Computer Week.

You might also like...

Is Data Governance Solely About Controls on Data?

Read More →