The Future of E-Commerce Data Interpretation: Semantic Markup, or Computer Vision?

By on

How will webpage data be interpreted in the next few years?  The Semantic Web community has high hopes for ever evolving semantic standards to help systems identify and extract rich data found on the web, ultimately making it more useful.  With the announcement of support for GoodRelations  in November, it seems clear semantic progress is now being made on the e-commerce front, and at an accelerated rate.  Martin Hepp, founder of GoodRelations, estimates the rate of adoption of rich, structured e-commerce data to significantly increase this year.

diffbot logo and semantic web cubeHowever, Mike Tung, founder and CEO of a data parsing service called DiffBot, has less faith that the standards necessary for a true Semantic Web will ever be completely and effectively implemented.  In an interview on Xconomy he states that for semantic standards to work correctly content owners must markup the content once for the web and a second time for the semantic standards.  This requires extra work, and affords them the opportunity to perform content stuffing (SEO spam).

Since launched, the search engines involved (Google, Yahoo!, Bing, and Yandex) have been very successful getting website owners to add structured data, and ultimately displaying these data in rich search engine results, like Google’s Rich Snippets.  But what Tung says has held true, at least to some degree.  SEO specialists see that content rich snippets help improve click-through ratios.  For example, a review or article may showcase a picture of the author and perhaps some rating stars.  This may catch the searcher’s attention first, and thus improve click-through and even conversion rates.  Unfortunately, pages are sometimes filled with falsified or exaggerated structured data, simply to stand out in search results.

Ultimately finding the semantic markup process impractical for coverage across the entire web, Tung believes there is an opportunity for artificial intelligence, namely computer vision, to lend a hand.  His company, DiffBot, takes a visual approach to viewing and understanding the data within a webpage.  Diffbot renders the webpages in the cloud, loads HTML and CSS, runs Javascript, and everything else.   Basically it will display the webpage just as you or I would see it on a computer.  From there, algorithms based on visual properties are used to pull out information, regardless of how structured the HTML is, or if it contains any semantic markup at all.

If an AI solution, like DiffBot, was always 100% accurate, there may not be tremendous need for semantic markup.  But that isn’t the case, at least not yet, and may take quite some time to get there… and some will argue that Visual AI will never be fully successful either.  As technology and creativity on the web is ever evolving, so is the way we see it.  The fashion vertical of e-commerce is a prime example, as quite a bit of aesthetic variety occurs.  This aesthetic disparity may be difficult for AI to accurately detect critical information points.  For example,  I’d say at least 90% of the time the product title appears as the largest content heading on an e-commerce page.  But on some high-end fashion sites, like Net-a-Porter, the brand or designer is the largest font, and the product title may be much smaller.

Example of shopping site showing Marc Jacobs designer bag.

Little differences like this can throw off even a human, so how will AI perform?  It’s likely to be a bit problematic.  In the world of data interpretation and extraction, getting precise information across a very diversely structured web is often difficult.   The reality is that no system is perfect yet, and both semantic markup and computer vision have obstacles ahead.  While the approaches of Hepp and Tung may differ, at the end of the day the goal is the same: make web data more useful and accessible.  With progress coming on both ends, we are moving in the right direction.

About the Author

Photo of Marc MezzaccaThis guest post comes from e-commerce entrepreneur Marc Mezzacca.  Marc runs a social-media based coupon code website called CouponFollow, and recently launched an automated coupon notification app, Coupons at Checkout.  You can follow him on Twitter.

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept