You are here:  Home  >  Data Education  >  Current Article

Crawling Opinions: From WikiLeaks To Your Business

By   /  October 25, 2010  /  No Comments

Last week saw the Wikileaks release of some 400,000 classified U.S. files on the Iraq war at a London press conference, which bring with them accounts of torture and abuse by Iraqi allies, findings that more civilian deaths occurred than were officially reported, and discoveries of Iranian-backed forces giving supplies to insurgents attacking coalition troops, among other distressing news. But its disclosures – or perhaps its controversial front-man Julian Assange – doesn’t seem to be winning it any fans.

Head over to opinioncrawl.com, and you’ll find daily sentiment rankings for Wikileaks – with 74 percent of them negative, calculated from sentiment on the topic expressed in a large number of recent Web publications – chiefly blogs, news sites and the like. That compares to a 64 percent negative sentiment on Iraq itself. Those are just two of the topics you’ll find monitored in an automatically generated blog for online sentiment on current events, the economy, politics, and more, from Semantic Engines LLC’s Opinion Crawl sentiment analysis site. A daily crawl of web sites to find newly published topic-related content and assess sentiment from it results in these posts, a process that leverages the company’s roots in semantic analysis to identify the sentiment, but also to extract semantic concepts that would allow OpinonCrawl to determine what drives those perceptions.

“In the background we run our semantic engine [dubbed SenseBot] and identify key concepts for this topic, and create a cloud of concepts to assess the perception of the topic by the public,” says Dmitri Soubbotin, CEO & Founder Semantic Engines. “It looks like a tag cloud, but it’s not keywords inside, it’s concepts.”

And the weight of a particular concept – its issue impact, so to speak — is visually indicated by its font size. Matching those extracted concepts to perception trends can yield some interesting results, he says. Some time ago, for instance, Semantic Engine explored sentiment on search engine companies, and saw that while Google and Yahoo stayed mostly stable Microsoft started slowly gaining favorable sentiment traction. “We looked in the concepts extracted and realized Bing was introduced about that time and slowly it was trickling up to the top of the [concept] cloud, and sentiment perception was improving as it was introduced and became more known,” says Soubbotin. “That’s a clear example of how new products or issues drive sentiment in a positive or negative way.”

Soubbotin hopes the OpinionCrawl site will drive more interest by enterprises in exploring how to put its semantic analysis and sentiment identification capabilities to use in their organizations – one of its services is to provide reports to business clients on the topics they specify on a subscription basis. Last week it also released a sentiment API that is available by application on a trial basis to companies that want to try mining that data themselves, and integrating results into their own enterprise systems. Making the connection between the concept cloud and sentiment trends could help a marketing department understand if a just-released product it is promoting is gaining weight in the cloud, and so actively playing a role in positive or negative perceptions of the company, for instance.  “You can see if your new product which was released during this time actually appears in the cloud of concepts,” he says.

Soubbotin sees sentiment analytics as having a role in many of the areas others in the industry do, too – including potentially having a role in creating an understanding of how a stock will trend. “Sentiment follows events. Predicting events is harder, but when you select the content sources properly and look at the issues in the semantic cloud, then I believe it’s possible to predict, at least in the short term, that now that an issue is appearing and growing we know sentiment will rise or fall,” he says, but cautions that “sentiment is  definitely one of the inputs for stock performance, but only one of many.”

And he acknowledges that there are plenty of other vendors who see these and other greenfield opportunities in semantic analytics. But he also thinks a lot of them have jumped into the game thinking that it’s easy, and that’s why some competitors have exited too. “My perspective is that it’s a hard thing to do and the only reason we are in this is because we have a strong NLP background in general, so we are basing this on something we have done for years,” he says. “It is a complex area, it has to be properly based on NLP, and we are distinguished by our use of semantics. We don’t think others are using it to the extent we are.”

Cloud Concepts For Wikileaks

In the case of Wikileaks, by the way, the issues that can affect the sentiment in a positive or negative way that show up in the content cloud include Iraq, government, Americans, torture, and human rights. Following the trail of the word Americans in the concept cloud, by the way, leads you to a story about a Pew Research Center for the People & the Press poll that shows that 47 percent of those questioned said the WikiLeaks leak back in July hurt the public interest – so perhaps not surprising that negative opinions take the lead following this latest round of releases, too.

About the author

Jennifer Zaino is a New York-based freelance writer specializing in business and technology journalism. She has been an executive editor at leading technology publications, including InformationWeek, where she spearheaded an award-winning news section, and Network Computing, where she helped develop online content strategies including review exclusives and analyst reports. Her freelance credentials include being a regular contributor of original content to The Semantic Web Blog; acting as a contributing writer to RFID Journal; and serving as executive editor at the Smart Architect Smart Enterprise Exchange group. Her work also has appeared in publications and on web sites including EdTech (K-12 and Higher Ed), Ingram Micro Channel Advisor, The CMO Site, and Federal Computer Week.

You might also like...

Machine Learning vs. Deep Learning

Read More →