A Simple Tool in a Complex World: An Interview with Zemanta CTO Andraz Tori

By   /  July 2, 2012  /  No Comments


Andraz Tori is the Owner and Chief Technology Officer at Zemanta, a tool that uses natural language processing (NLP) to extract entities within the text of a blog and enrich it with related media and articles from Zemanta’s broad user base.    This interview was conducted for Part 3 of the series “Dynamic Semantic Publishing for Beginners.”

Q. Although the term “Dynamic Semantic Publishing” appears to have come out of the BBC’s coverage of the 2010 World Cup, it looks as though Zemanta has been applying many of the same principles on behalf of smaller publishers since 2008.  Would you characterize it this way, or do you think that Zemanta is a more limited service with specific and targeted uses, while the platform built by BBC is its own semantic ecosystem?  How broadly should we define Dynamic Semantic Publishing?

A. What Zemanta does is empower the writer through semantic technologies. It’s like having an exoskeleton that gives you superpowers as an author. But Zemanta does not affect the post after it was written.   On the other hand dynamic semantic publishing is based on the premise of bringing together web pages piece-meal from a semantic database, usually in real time.

Dynamic publishing has been around for the long time and it seems semantic technologies could make it even easier to implement.  It was always possible to pull data together in PHP by hand, now semantic technologies are allowing some standardization that allows more complex dynamic publishing systems to be build more efficiently.

Q. How would you describe Zemanta’s user base?  Is it larger enterprises (such as larger news organizations), SME publishers or all of the above?  What kinds of publishers stand to benefit most from its use?

A. Zemanta’s user base consists of over 90,000 bloggers. The vast majority of these are individual bloggers and SMEs.  Blogging is one of the best ways to generate in-bound leads for SMEs.   Zemanta enables them to compete with bigger guys by giving them blogging superpowers.  We also have some big publishers like Forbes where a big publisher enables Zemanta for all of their bloggers.

Q. Do you provide syndication as part of your service?

A. Zemanta does not do any syndication. It works by suggesting additional images, links and tags while author is still writing the post. It does not pull full text from other sites and it always requires author to create original content.  A lot of other tools focus on republishing or repurposing of content – in contrast we believe in power of original content, of author’s own voice.

Q. What is your favorite Zemanta success story?

A.  When we hear from our bloggers how we helped them to create better content for their readers, it’s a success. We’ve seen numerous times our bloggers forged new friendships when Zemanta told them there is this other blogger out there writing on the same topic. Together better content and more relationships lead to more readers.  And more readers is what bloggers ultimately want.

We’ve had regular bloggers getting at top of Hacker News or being featured on much more trafficked blogs, and they attribute that success to Zemanta.   But in reality we’ve just made it slightly easier, all work was theirs.

Q. How would you like to build upon your successes and/or improve the service you provide in the near term?

A. We have some big plans near term. We plan to offer opportunities for bloggers to monetize their blogs through relationships with brands – we’re working on this product together with Federated Media Publishing.

Q. Does Google’s Knowledge Graph affect your plans in any way?

A. As for Knowledge Graph – it’s in part based on Freebase which we also use in Zemanta.  This is good news since the better Freebase is, the better Zemanta works!  I am sure more some part of Google search queries in the future will be answered directly from the Knowledge Graph, it just make sense.  At the same time, regular information retrieval isn’t going away any time soon.

Q. Following your thought-provoking presentation about the poor quality of the user interface/experience, has there been much progress made over the last few years in terms of design?     What semantic tools or apps that you see out there take into account good design principles or appeal to a wider user base?

A. Right now there are very few consumer oriented web sites driven by semantic web stack.

Most of the focus of semantic technologies is on the enterprise market and it might be that this is incompatible with being appealing to broader set of web developers. The characteristics of those two markets are very different.

Maybe semantic dynamic publishing will be the next big thing in web development, but at the same time competition is strong – NoSQL based solutions promise similar advantages in dynamism without additional baggage.

However I’d already start looking into the future. We need to talk about semantic web on mobile. How to build apps with these technologies. How will next Siri be built?

Q. Could you explain what you mean by “NoSQL based solutions promise similar advantages in dynamism without additional baggage.”  What kind of dynamism?  And what baggage?

A. When developers create rich dynamic web pages based on data, one of the things that slows them down is a very static nature of schema in SQL solutions.  SQL simply isn’t made for rapidly evolving schemas and adding new types of data in the database and connecting it in ad-hoc ways to create dynamic web.

Most of NoSQL solutions are schema-free or have dynamic schemas that are developer driven (no DBA needed to evolve them).  The semantic technology stack offers similar promises of rapid adaptation to new kinds of data, and it is competing with NoSQL in this area.

At the same time official Semantic Web stack as promoted by W3C brings along a lot of baggage. It has specific demands on how data has to be organized and connected. It is in essence an enterprise technology.  The barrier to entry is very high while NoSQL is at the opposite side of the spectrum – it is really simple to take first steps. We’ve seen in the past that web developers have adopted technologies with lowest barrier to entry that need less tooling (think JSON vs. XML, HTML+Javascript vs. Flash, HTML5 vs. XHTML2, MySQL vs. PostgreSQL).

So [a] prediction for the future is that Dynamic Semantic Publishing is going to stay a niche technology.  It works and it will be used in many more projects, similarly to how BBC has done it, but it will not go mainstream.

[On the other hand,] there’s also an option that the W3C decides to drop its current approach and starts a new semantic stack with less baggage. But looking at how situation has been developing over the last couple of years that seems unlikely. There are attempts to hide complexity (RDFa 1.1), however what is needed to go mainstream is truly removing complexity.


Kristen Milhollin is a writer, mother, champion of good causes, and semantic web enthusiast.  She is also the Project Lead for GoodSpeaks.org.

You might also like...

Evolving Cloud Networks and How to Keep Them Secure

Read More →