Friday, September 30, 2005

Automated Sentiment Detection

I've been doing some research to understand the state-of-the-art of automated sentiment detection. This natural language processing technology is still pretty young and from what I'm sensing no one's really commercialized anything that delivers high-quality results yet. There are a few players out there (you know who you are) who have products in this area, but it's tricky stuff.

Sarcasm, irony, double negatives all wreak havoc with automated detection.

Much of this work is still at the university level and the papers published in the area focus on trying to detect the opinions of authors in movie reviews, hotel ratings, etc. The recent interest in monitoring blogs is spurring more discussion in the commercial space.

So it's not too surprising that the industry hasn't settled on terminology yet either. I've seen a host of words being used to describe this process of assigning a positive or negative score to an article -- tone, tonality, polarity, affect, sentiment, favorability (or favourability, if you're in the UK), opinion, mood. I generally use the term sentiment because it's had the most pickup.

There are also different types of sentiment assignments. We can talk about it from the perspective of the author or the perspective of the consumer of the information. For example, a hurricane can be written about as a negative event. However, to the construction industry it's a positive event because it means the beginning of a rebuilding boom. It's not clear to me which terms should be used to describe these different perspectives. Is "sentiment" the view of the author and "favorability" the view of the reader? Not sure.


Matthew Hurst said...

As for the hurricane example, you need to consider the notion of expressions of evaluation in the text rather than any reasoning about material (or other) benefits. Therefore, if it is stated that 'the hurricane was a nightmare' then the sentiment is negative. However, if a construction related author states that 'the hurricane was a real windfall' then it is positive. That part of the problem is actually not much of a mystery. One doesn't need to carry out any complex reasoning beyond what is expressed in the text.

The challanges lie more in the area of the need for accurate grammatical analysis, the selective behaviour of polar terms and the association of polar expressions with subjects.

Irony, sarcasm and humour are beyond these challanges, though Umbria has recently claimed to be capable of dealing with these issues.

glennfan said...

I think I agree with you regarding that what is literally stated is the "sentiment" of the text. Though what about examples such as:
1) Acme Corp. announces earnings.
2) Acme Corp. announces earnings loss.
3) Acme Corp. announces earnings loss, but beats Street's expectations.
All three could be about the same event but could be seen as 1) neutral, 2) negative, 3) positive or mixed. said...

Perhaps worth mentioning - twitter based sentiment analysis tool said...

Perhaps worth mentioning - twitter based sentiment analysis tool