Friday, May 20, 2005

Extracting Sentiment

There have been many attempts in the industry to find an way to automatically extract the sentiment (tonality) from unstructured text. I think there will continue to be progress in this area as more and more businesses are trying to find out not just how much is being written about them, but what the tone of the writing is.

Our friends at Intelliseek have an approach, and so do many others.

The challenges of automatically extracting sentiment are many. It is very difficult for computers to extract meaning from running text. Nuances of language, wit, sarcasm, irony are virtually impossible to detect. Even double negatives can make NLP software confused.

Add to that the issue that sentiment is in the eye of the behold. What's good for one company often by definition is bad for its competitors. Is the answer to extract the sentiment of the author? Perhaps.

All of this leads me to believe the only way you can build a commercial software package to extract sentiment is by integrating a human in the loop -- either as a final check or as a trainer of the system.

I'm not quite sure of the details of how that should be done, but I'm pretty certain that the answer lies in there somewhere.

1 comment:

Matthew Hurst said...

There is an interesting distinction to be made between extracting sentiment and measuring sentiment. In the former, the results ought to be precise and follow in some sense the distribution of sentiment in the entire data set. In the later, the results need to measure the positive and negative commentary in some meaningful manner with quantifiable error - both precision and recall are important. In practical implementations, users will want to do both (which may use a single system). The ideal solution for each task may be less than ideal for the other.