What’s the most important requirement for sentiment analytics to succeed? Make that question plural, and let’s start our answers with something that the tools in this area themselves have no influence on: Good quality data.
During yesterday’s second annual Sentiment Analysis Symposium in New York City, hosted by Alta Plana Corp. and its founder Seth Grimes, the audience got an earful about how bad data can negatively impact efforts to understand sentiment before they even get underway.
“To take advantage of sentiment analysis you need good data,” said Israel Mirsky, EVP, Emerging Media & Technology, Porter Novelli. If sentiment analysis is performed over corporate-owned data, chances are better that it’s in good shape than when it’s information sourced from the world of social media. “The success of these tools requires being run on data of the highest quality,” he said. “Bad research equals bad results.”
The audience was urged to test out their data source providers to see who really does do the best job, paying attention to issues such as whether there are holes on the blog and forum aggregation side or on the real-time data aggregation end of things. Twitter’s own API feeds came under scrutiny. Using that approach, Mirsky said, can be a big mistake, because of the caps it places on data. He also expressed concern that those providers who say their apps have access to all the firehose data actually do. “It’s ugly out there,” he said.
In fact, pointed out Steve Rappaport, Knowledge Solutions Director, the Advertising Research Foundation, about 5 to 15 percent of data makes its way out through the Twitter API compared to the 90ish percent-range from Twitter’s firehose.
Other panelists agreed that poor data can lead to problems down the road. “Data is messy. Twitter in particular is really messy,” says Jeff Catlin, CEO of Lexalytics. Even assuming you have a relatively clean Twitter feed with relatively topical data – a big assumption, he said – then you have to figure out what people actually are saying “in whatever language it is they’re writing in. God help the poor machines. We’ve put years and years of work to understand grammar, capitalization, and crap like that, and there ain’t none.”
Don’t forget the junk data in the mix, too. “Once you eliminate spam, content farms and invalid mentions, for many clients there is not enough validated, real, important content left to make automated sentiment analysis accurate,” said Katie Delahaye Paine, CEO, KDPaine & Partners, a consulting and research firm that helps companies measure the success of social media and traditional PR programs. “You must figure this out and be honest if this is a good fit. Every company has been pitched by some automated sentiment analysis company and half the times I had to say, ‘Look, you get 100 clips a month. It’s not worth it.’” To add to that, some companies just don’t generate a lot of sentiment, one way or another. “Eighty percent of the conversation is neutral – people are just making an observation,” she said.
One of the points that was made was that marketing seems to have been the zone where sentiment analysis first landed, but it’s not necessarily an exclusive fit or perhaps even the best one. What might be? Catlin sees travel and tourism as a natural fit – “there’s lots of data feeding back all the time,” he says. Symposium attendees also saw presentations that made a case for sentiment analytics in financial sector and trading apps, for example. “The most important thing in the stock market is early warning [that] something is going up or down,” said Ronen Feldman, associate professor at Hebrew University and co-founder of text mining company Digital Trowel. “But you can’t use sentiment alone. You have to use it with technical and fundamental analysis. Then you add sentiment and you get much better results.”
Then there’s eBay, which has experimented with sentiment analytics starting with product domains and reviews and wound up to where the sentiment analytics engine it created and the NLP group it runs are helping the company’s IT deal with the early tracking of site problems and downtime. It’s taken a path that has included leveraging the application of general sentiment analytics to find negative tweets about site performance, extracting events, and correlating keyword analysis to spikes in related Twitter traffic, says Catherine Baudin, Senior Research Scientist, eBay Research Lab. “Not every thing has to be the most fantastic NLP technology,” she said. “If you can combine multiple clues you might be able to do some things.”
This doesn’t mean that marketing efforts should necessarily lose the toehold they have in sentiment analytics. It’s as much a matter of how you look at its application there as anything else. “Markting resource management looks at all the ways we communicate,” said Seth Earley, president and CEO of Earley & Associates. “Sentiment analysis is one piece.”
And, says Catlin, marketers need to break out of their past practices that meant caring about all mentions and all sentiment on all mentions. In other words, you just don’t need all the gory details you once did when the world had a lot less content. “In the new world with this much information that’s not important,” he told The Semantic Web Blog. “You want mentions but the only ones that affect your brand are at the edges. Everyone is saying there’s too much gray – well, throw it away. It really doesn’t affect your brand. Maybe you’ll miss an important nugget here or there but really, the just about any of the engines out there [including Lexalytics] get the outliers right. Out here [on the edges] the engines are great. So it’s not so much that [sentiment analytics] isn’t right for marketing, but that marketing doesn’t always look at it in the right way.”