A human who reads this naturally would understand the links between the references:
"I went to the store and got a magazine for my mother and father. She liked it. He didn't care for it and so he's going to see a movie instead. It's about a trapeze artist."
A real live reader would know that it is the writer's mother who is the "she" who liked the magazine; the writer's father who is the "he" who didn't; and that the "it" that was both liked and pooh-poohed is the magazine â€“ and the "it" that concerns a trapeze artist is a completely separate reference to a movie.
It's harder for machines to accurately connect such links between an originally named entity and then the pronouns or other references subsequently used in its place -- and then accurately relate each instance of the same entity to its intentions, positive or negative impressions, or other signals. But semantic technology web service OpenAmplify, which just entered its V. 2.1 beta release, is aiming to start cracking the related entity nut with the addition of Co-reference to its core "meaning" platform.
OpenAmplify makes its bones on detecting syntactic roles in text and identifying them very accurately, it says, with some 14 NLP patents so far related to how it understands human language as humans do. That matters for many reasons, not the least of which is one of the company's initial strategic targets: digital media around brand management, where Co-reference plays a big role.
"When people talk they don't repeat what they talk about over and over again," says CIO and co-founder Mike Petit. They may reference the same person in multiple ways; say, newly minted Tea Party Senate candidate Christine O'Donnell as "Christine O'Donnell", "O'Donnell," or maybe even "witchy woman" (just kidding). And besides simply using "she" as a substitute for "mother" or any other named female entity, they may use acronyms or other human-understandable references to refer to the same person or event or item.
Take this as an example: Someone may write in a post that he just test-drove a BMW, then muse about something else in the next sentence, and then return to the car chat, noting that the Bimmer is a rocket. OpenAmplify's new technology can appropriately connect the dots between who said or did what and what the what is, regardless of how the who or what is referred to (got that?). So, "signals like polarity, intention, and sentiment get attached to that original entity," says Petit, so that you can understand the conversation around the brand. In this case, the writer appears to have formed a positive impression, so may be a good candidate for reach-out campaigns.
"The level of accuracy we can bring is much higher than if you can't do that," Petit says. It took a long time to do Co-reference and that's because it's hard."
Co-reference is admittedly a work in progress, with capabilities such as tackling ambiguity on the radar, Petit says. Take that problematic pronoun "it" in the example above, where the word points to two different nouns, or in a posting such as: "I drove my car today. It's raining." Clearly, to a human reader, there's no connection between the word "car" in the first sentence and the word "it" in the second. But there could be if the posting was: "I drove my car today. It was raining and it did great in the rain."
From Scalability to Sarcasm
The idea that infuses the OpenAmplify platform as a whole, Petit says, is that you can't monetize or utilize what you don't completely understand. He sees sentiment analysis as just part of the equation, a component of a combination that also has to surface whether someone is asking for advice or giving advice, as one example, as a precursor to understanding who influencers are for brand management efforts or what inventory to take up in advertising campaigns. The platform also is built on the cloud so that it can scale massively. "Given that the number of users of social media are scaling massively," Petit says, "you had better be ready to take on that scale."
Petit says they're also working on getting ready to take on a few other interesting challenges. For instance, he hints at having a new approach around domains and how to make classification work better. He also mentions emotion â€“ not polarity but understanding whether there's love, anger or envy in content; irony and sarcasm; and a BS detector. To get to these results means being able to parse out things such as flowery language (love); overly flamboyant treatments of mundane topics (sarcasm); and even meandering around a point (BS).
Fun stuff. But as Petit notes, there's an important point to be made, especially around the BS detector. "To some degree this is probabilistic. You can't get things 100 percent right. Even human beings don't," he says. "That's why Ponzi schemes work."