[Editor’s Note: This guest post is by Tom Reamy, Chief Knowledge Architect and founder of KAPS Group, a group of knowledge architecture, taxonomy, and eLearning consultants. Tom has 20 years of experience in information architecture, intranet management and consulting, and education and training software. Tom will be presenting a tutorial, Text Analytics for Semantic Applications and moderating a panel, Emotional Semantics – Beyond Sentiment at the upcoming SemTechBiz Conference in San Francisco.]
While sentiment analysis continues to generate a lot of press, it is not clear how much real value organizations are deriving from it. One reason for that is that the standard approach to sentiment has been mostly statistical and/or long lists of sentiment terms. However, if you add in other, advanced text analytics capabilities such as auto-categorization using advanced operators, you can not only develop more sophisticated sentiment analysis, you can also develop a whole new class of applications that either enhance and/or go beyond simple sentiment analysis.
These advanced operators include such commands as DEST_6 (count two words as a positive indicator only if they are with 6 words of each other) or SENT (only count words in the same sentence).
The first part of this rule simply states that the software should count any variation of the word “customer” if it is within 5 words of any variation of the words “phone” and “lost” or “stolen”. Additional parts of the rule specify to not count these hits if the key word “customer” appears in specific phrases. This last part can eliminate what are called false positives.
Currently, a lot of sentiment analysis is done with just long lists of positive and negative words and phrases like “fantastic” or “great phone”. The problem is that “great phone” looks like a positive sentiment unless it is in a phrase like “This could be a great phone if it weren’t for the lack of reception” which is not only not very positive, but it also misses the need to address a particular feature.
In this example, the rule with the advanced operator states that you should look for any variant of the word “terrible” (which typically would include a lot of different words) within the same sentence as the word “support”. And in the example of “great phone” above you could write a rule that stated that “great phone” was positive unless it appeared in the same sentence as a word that scored a negative hit.
Enhancing sentiment analysis is an important topic in its own right, but in addition, there are other kinds of applications that can be built using sentiment analysis techniques in conjunction with advanced auto-categorization capabilities.
The first application is expertise analysis which is a research area I’ve been working in for a couple of years. Expertise analysis is basically using text analytics software to characterize the level of expertise in a document or collection of documents, for an author’s set of works, or for an entire community. This is possible because experts think differently than non-experts (within their field) in a number of ways that can be captured in text.
First, experts tend to focus on process or procedures rather than subject. In other words, their articles will be more about techniques within a field rather than explaining the subject matter of the field. They also chunk ideas based on deeper functional properties than non-experts and their “chunks” tend to be bigger.
In addition, experts tend to operate on a different level of generality than non-experts. For example, in a simple 3 level hierarchy of superordinate-basic-subordinate (say Philosophy-Epistemology-Qualia), novices tend to stay at the superordinate level, the general population in a field would be mostly basic level, and experts would focus on the subordinate level.
This sort of analysis is very context sensitive both in terms of the overall context of the document set (what is expert in a news feed would be basic in a collection of research papers) and the context within a document. For example, the word “test” by itself is a general level term, but if it shows up in a phrase, “predictive value of tests as a function of economic class”, then it is more likely an expert.
The importance of context makes it important to be able to build sophisticated categorization rules that not only look for statistically significant pattern of expertise words but also looks at the context of words nearby or in the same sentence or paragraph.
Expertise analysis can be used to add a new dimension to sentiment analysis by weighting the contributions of experts more heavily as well as characterize the expertise level of different communities around specific topics and/or products. It can also be used to build an expertise location KM application without the need for individual to maintain their own profiles which is a traditional source of failure in this type of KM application as experts don’t want to take the time to write their own profiles. In social media applications, it could be used for more fundamental analyses such as characterizing the expertise level of a group of potential terrorists for areas such as bomb-making.
Another way to use text analytics/sentiment analysis software is to develop document level behavior prediction applications. One project we worked on was to try to distinguish customer comments that indicated that the customer was likely to actually cancel their telecom subscription or if they were merely bargaining. In this case, the first step was to categorize the content of a customer support call to find the calls about subscriptions and that the call discussed the possibility of canceling. This categorization was largely based on subject words.
However, to distinguish real threats from bargaining, we had to look at a different class of words. For example, in the phrase, “he will cancel his account (“if” or “unless”) he gets something (stop calling or price reduction, etc.)”, the words, “if” and “unless” are bargaining words. On the other hand, the phrase, “I want to know about my cancellation date” tends to signify a more serious threat to actually cancel.
This sort of behavior prediction analysis is not advanced enough to simply make decisions based on the software characterization, but in conjunction with an analyst, can greatly speed up and enhance the analyst’s work. This document level analysis can also be combined with broad statistically based predictive analytics applications by providing a richer description of individual documents in a pre-processing phase.
The last type of application is what we call the Cloud/Crowd Sourcing of Technical Support. We did another research project in this area to scan technical phone forums (Android and Sprint) to determine if the software was able to find both potential problems and potential solutions to those problems and pull them out into a knowledgebase. The technique was to build a taxonomy of products, features, and problems and then identify if they appeared in a forum post. Next, we looked at the surrounding text to look for potential solutions. For example, near a mention of a problem with an Android phone, finding text such as, “download a screenshot app from a vendor or download the android SDK and use the following method”.
A sample categorization rule for this type of analysis might be something like:
If [android] DIST_6 [problems] AND [software] AND DIST_15 [methods] then display the [methods] text, where the words in brackets indicate any number of variants from different ways to spell or refer to the phone or phone feature to a full list of method or technique words which can be a simple list or even a structured list. In English, if you find any variant of “android” within six words of a variant of “problems” (both known and general problem words) and specifically software problems then look within 15 words for any type of word having to do with methods (and they could be simple words like “fix” to specific technical words like “SDK”).
This kind of application, again, is not meant to be automatic, but to provide a way for an analyst to scan large amounts of text and quickly find potential problems and solutions.
In conclusion, full featured text analytics software in conjunction with sentiment analysis techniques can go “Beyond Sentiment” in a number of ways, including more sophisticated sentiment analysis and better Voice of the Customer applications and the development of new kinds of applications such as expertise analysis, behavior prediction, and Crowd Sourcing technical support.