You are here:  Home  >  Data Blogs | Information From Enterprise Leaders  >  Current Article

Catching Up With Yandex: What Russia's Leading Search Engine Has To Say About Schema.org

By   /  April 25, 2012  /  No Comments

Update: Yandex today (April 26th) reported that net income in the first three months of 2012 rose 53 percent from the same period last year to 1.26 billion rubles ($43 million) as text-based advertising revenue rose, according to Bloomberg. Sales gained 51 percent to 5.9 billion rubles.

In November Russian search engine Yandex joined Google, Microsoft Bing, and Yahoo! to collaborate on schema.org. The Semantic Web Blog recently caught up by email with Alexander Shubin, Yandex product manager and head of strategic direction, to discuss this and other developments.

The Semantic Web Blog: Can you update us about how Yandex is doing? We know it’s still leading search traffic in Russia, but do you see more competition there, and how have international expansion plans been proceeding?

Shubin: Yandex is the leader in Russia with 59 to 60 percent market share. Russia is one of the few countries where a local search engine keeps a leading position, in spite of international players’ expansion.

Last year Yandex was launched in Turkey, where we suggest 12 services (including web search) so far. According to our statistics, yandex.com.tr processes more than 1 million queries daily. Turkey is the first non-Russian speaking market for us and we have done a lot of work to deliver services that would be interesting for the local community.  The main target for Yandex in Turkey, where one search engine still keeps 90 percent of search market, is to become the Number 2 player and to deliver more local search results and services than our competitor does.

Turkey is more or less an experiment for us: If we meet our target there, we can potentially do the same on any other non-Russian speaking market. But it is too early to make any conclusions or announcements so far as we have worked in Turkey only half of year. Stay tuned!

The Semantic Web Blog: Was Schema.org and microdata Yandex’s first experience with incorporating semantic technology for search? If not, what else had Yandex looked into and perhaps even implemented?

Shubin: Our experience with semantic technologies started in 2007. That year we began consuming FOAF for our Blogs Search. Soon our FOAF extension became one of the most popular in the world. For now we’re indexing hundreds of millions profiles with FOAF.

In 2010 we continued with supporting hcard microformat for our organization’s search. After that we got on board hreview, hproduct, and hrecipe microformats for our services and web search. Besides, we’re consuming Open Graph for our Video Search [see above],  and developed our own vocabulary for encyclopedia articles (unfortunately for now almost all documentation is available only in Russian).

The Semantic Web Blog: What led Yandex to join the Schema.org collaboration?

Shubin: We believe that processing semantic markup and supporting semantic technologies are very important for search engines. First, because of the huge amount of data that is already available in structured form. Second, because this is a convenient way of collaboration between search engine and webmasters. That’s why we were very excited about the schema.org initiative. With schema.org, a webmaster shouldn’t care about how he will say something to every search engine, but can concentrate on what he is going to say.

We consider the schema.org initiative an important milestone in Web development. It is obvious that semantics will play a great role in the Web of the future. The question is only about form. Will it be a mess of vocabularies for every semantics’ consumer and torture for webmasters (as with browsers and html-tags), or a single standard mechanism convenient for both sides — publishers and customers. The second way is much better! So we’d like to be an active part of schema.org development to be sure about accounting for our needs, and that we shouldn’t create our own vocabularies (as it was before).

The Semantic Web Blog: Recently you posted news about your own proprietary Spectrum technology powering the ability to understand users’ search intentions by letting them narrow the focus to images, films, recipes, etc. Did or didn’t schema.org microdata have a role in that?

Shubin: For now semantic markup is not used in Spectrum. It can be one of the possible improvements in the future.

The Semantic Web Blog: What benefits has Yandex itself so far seen from supporting the common vocabulary for structured data markup?

Shubin: There are two main benefits. The first one is obvious — by supporting common vocabulary we’re gaining access to the huge amount of data already marked with it. The second is that with the joint effort of all major search engines, it becomes much easier to convince site-owners to use markup on their sites. Because with one simple step they get the benefit in all search engines at once.

The Semantic Web Blog: What of the current Schema.org proposals here seem most practical/valuable in your opinion?

Shubin:: In my opinion all of them are valuable because they provide vocabularies for parts uncovered by schema.org. It is worth it to note that there is a huge number of proposals, which means big interest from the community. By the way, this situation is leading to questions about the schema.org extension strategy. It is a hot topic of internal discussions now. That’s why I think that External Enumerations proposal is very useful for further schema.org development.

Also, I’m personally excited about integration with GoodRelations, because it is a big, well-known vocabulary for e-commerce sites that are traditionally very active about every search engines’ initiative. So I suppose that integration will stimulate shops (especially big ones) to use markup on their pages.

The Semantic Web Blog: Since Yandex’s joining the schema.org collaboration, what has Yandex been focused on bringing to the discussion? Are you involved in the ongoing management yet, are there particular proposals you want to raise, or any other influence?

Shubin: Sure, we’re involved in day-to-day work along with Google, Microsoft and Yahoo!. It includes reviewing external and internal proposals (new classes and fixes for current), working with the community, and some other stuff. Since our joining we have been focused on questions of general development (there is plenty of work!). Besides, we proposed corrections to existing vocabularies which are essential to our services. Also we’re working with Russian webmasters (translating documentation, helping with markup on their sites, etc.). For example, we have our own markup testing tool (for now it processes only markup that Yandex is consuming).

The Semantic Web Blog: Where/in what ways do you think Schema.org could/should improve that perhaps are not yet under discussion by the community at large?

Shubin: One of the important directions that is not widely discussed now is supporting culture specifics. If we want schema.org to be the main vocabulary for major sites around the globe, we should think about it.

Another thing is about class design. It has became obvious that for further successful schema.org development we need to use multiple-types paradigm (e.g., the possibility of using separate classes ‘athlete’ and ‘person’ together for one entity). There are some technical issues, but all of them can be solved.

There are a lot of other questions that need to be solved. We have a lot of exciting challenges ahead!




About the author

Jennifer Zaino is a New York-based freelance writer specializing in business and technology journalism. She has been an executive editor at leading technology publications, including InformationWeek, where she spearheaded an award-winning news section, and Network Computing, where she helped develop online content strategies including review exclusives and analyst reports. Her freelance credentials include being a regular contributor of original content to The Semantic Web Blog; acting as a contributing writer to RFID Journal; and serving as executive editor at the Smart Architect Smart Enterprise Exchange group. Her work also has appeared in publications and on web sites including EdTech (K-12 and Higher Ed), Ingram Micro Channel Advisor, The CMO Site, and Federal Computer Week.

You might also like...

Three Traditional Storytelling Techniques That Add Value to Data and Analytics

Read More →