We reported a couple of months back about how AOL’s purchase of the Huffington Post delivered some more semantic technology to its doorstep. But that’s not the only semantically-related project underway there: Amit Moran, R&D Manager at the AOL Relegence team, is working on a movie sentiment ranking project using Twitter and other social media data. It should see the light of day in the near future on one of its properties.
“We are developing a general infrastructure for sentiment analysis that deals with finding out what people think of entities in a content stream,” Moran told attendees at the recent Sentiment Analysis Symposium in New York City. The use cases for sentiment analytics are wide – what do you think of a restaurant, or a politician, for that matter – so domain specificity becomes important in its actual application. AOL went with movies as a first option – not a surprising choice given its ownership of sites like Moviefone.
Identifying entities in tweets and opinion towards them, and then aggregating it all to come down on whether the movie’s tracking at a high positive or high negative rate in the rankings has its challenges. As an example, there’s a movie called The Roommate, but there are just a lot of plain old roommate references out there in the Twittersphere, too. And once its algorithm susses out that it is a movie being referenced, it has to make sure the opinions being expressed relate to the movie and not something else mentioned in the tweet. And, there’s the issue around negation, too. “Someone might tweet that they didn’t enjoy Black Swan,” he said. Add on to that weighting word choices – saying a movie wasn’t bad isn’t the same as saying that it was good.
Technologies such as NLP pattern analysis (is the word flick, for instance, represented in a tweet about The Roommate), supporting rules, and profiling via a unique set of queries for each movie help in its quest to extract movie entities and sentiment related to them, and to take all the entities and aggregate results that will lead to a final rank for each movie. A tweet that reads “I just saw Toy Story 3; I loved it” needs some parsing out, for instance, to know if love relates to the movie. “You have to identify an implicit reference for a movie like the word ‘it,’ Moran said.
But things can get even less cut and dry in the still nascent automated sentiment space. “What to do if several movies are mentioned? Very simple – just ignore these tweets,” he says. And when questioned about deriving the scores, Moran said it’s something that’s actually being done manually right now. “We wanted to do it automatically and then we hit the deadline,” he said. One thing AOL learned while trying to score tweets over a window of time is that there’s a large bias to the positive in talking about movies on Twitter; only about five percent were negative in the sentiment comparison bucket. The end result could be that all movies would wind up with rankings of 4 or 5, and “that wouldn’t do us any good,” he said.
Compensation must be made to get a bell-shaped distribution of ranks, and that involves calculating a final score based on how far away sentiment seems to be from expectations generated by a calculation that includes things like ratio of tweets and how many of them expressed a sentiment over an extended period of time (with some division thrown into the modeling environment mix). “So we extract the expected value of positive and negative scores and calculate the final score based on how far away it is from the expected value,” he said. “A movie with more positive reviews than we expected, we give a higher rank to. That looks more like what we wanted, and that can be divided into 1, 2, 3, 4, or 5 rankings to behave nicely over a period of time. … So either more positive than normal or less positive than normal is what the rank means.”
Got that? Well, apparently the issues involved in getting to that point are nothing compared to what is the worst part of Twitter for anyone who hopes to get some analysis out of it: Justin Bieber. “Any analysis you try to do on Twitter, whether trending or sentiment analysis, Justin Bieber tends to dominate,” Moran complained. Maybe it wouldn’t be so bad for the AOL project if Bieber hadn’t gone and come out with a movie, Never Say Never, too. “We started to see a lot of tweets talking about that movie,” he said. “So beware of him.”