We hear and read a number of discussions around the idea of sentiment. Sentiment seems to elicit just as many passionate responses as a discussion around influencers. Sometimes I think it might be fun to introduce the idea of sentimental influencers and just watch the resulting conflagration. Because analyzing social media conversations for sentiment is so difficult I thought it would be best to canvas the team at CI on their views of sentiment and how our technology works with this very tricky bit content. Here are some things the team suggested you keep in mind when analyzing content for sentiment.
Quality of Social Media Content
Before even looking at the accuracy of sentiment scoring, it is important to look at the quality of the content to be scored. If the content is off-topic, contains duplicate posts, or is full of spam then event applying the best sentiment algorithm will produce unusable results.
Targeted Sentiment
Accurate sentiment analysis also relies on correctly associating the sentiment being expressed with the appropriate object or topic of interest. Often within a post, sentiment may be expressed about several different things. CI Insight identifies the topic of interest within the post then only considers sentiment within this narrow proximity as it relates to this concept. Using concept-centered "snippets" of text greatly improves the chances that computed sentiment applies to the concept of interest. In addition, CI Insight allows content to be filtered to only those posts expressing a personal opinion (as opposed to product marketing material which may sound positive but is not an active opinion being expressed by a customer.)
Evaluating Sentiment Performance
Evaluating the accuracy of an automated sentiment scoring system can be surprisingly complicated. Many times systems are evaluated as to how accurately they score against a gold standard set of sentiment data. This can often give reliable data for comparing one automated system to another, but how accurate are the gold standard ratings themselves? Several studies have looked at this reliability (1, 2, 3) and found that independent human raters only agree with the gold standard ratings 70% of the time. This means that the best any automated system could hope to attain would be in that same accuracy range. Claims of 85-90% accuracy are usually for a system that has been trained and tuned to perform well on a specific test set. Scores in the wild would be invariably lower.
Another problem typically encountered when evaluating sentiment accuracy claims is the inherent bias introduced by the natural distribution of positive, negative, and neutral posts. FreshNetworks did an evaluation of 7 social media monitoring companies and determined that this bias caused misleadingly high accuracy sentiment scores. In many cases, neutral posts can make up 60-70% of the total posts related to a specific topic. Given this distribution, a sentiment algorithm which just labeled every post 'neutral' could easily produce a 60% "accuracy" score. Once the volume of neutral posts is removed from the test then the accuracy scores typically plummet.
1) Gindl, S.; Liegl, J. (2008) Evaluation of Different Sentiment
Detection Methods for Polarity Classification on Web-Based
Reviews. 18th European Conference on Artificial Intelligence
2) Strapparava, Carlo and Rada, Mihalcea. (2008) Learning to Identify
Emotions in Text. Proceedings of the 2008 ACM symposium on Applied
computing.
3) Bermingham, Adam and Smeaton, Alan F. (2009) A Study of
Inter-Annotator Agreement for Opinion. Proceedings of the 32nd
international ACM SIGIR conference on Research and development in
information retrieval.