Thursday, May 26, 2005

The Genre of Blogs

I've been spending some time studying the genres of blogs. Specifically, I've been looking at the most popular, most linked to, most prolific blogs. You can see lists of them on Technorati and Intelliseek.

I've found, in my un-scientific study that most of these "top" blogs are what I'd call pundits (
Instapundit, Eschalon, and Daily Dish for example). Often they are focused on politics -- American politics to be specific.

There are a few traditional journalists at the top of the list. And there are the smattering of humor blogs, consumer reviews, IT and music blogs.

But only two or three out of 100 are the classic blog genre -- diaries or journals. The most popular of these right now are
Dooce and Baghdad Burning. The former is probably popular because it's a bit quirky and well written. The latter because it's a compelling look at the author's daily life in the center of a war.

However journals are what most people create when they first create a blog. They talk about their lives. Not very interesting stuff to the average reader, unless the author is a very compelling writer or is living a very interesting life.

Monday, May 23, 2005

Feeling a Bit Alone

Maybe I'm not looking in the right places but I've found that there are not too many blogs out there who are talking about text mining and unstructured data visualization.

Part of me thinks this has more to do with the quality of searching across blogs than it does with the availability of the content.

Technorati and Intelliseek's Blogpulse offer pretty comprehensive searchable indicies of the Blogosphere, but I never walk away feeling the most relevant blogs are coming to the top. Bloglines is talking about a new offering in this space. But there is no Google of blogs yet.

Hmmm. Why isn't Google the Google of blogs? You certainly get blog hits back with your regular search, but blogs are real-time discussions and a search engine like Google is mixing them in with all the other sites it's updating. Not sure what the conventional wisdom is right now on how frequently Google reindexes "the entire" Web. Certainly some pages are intraday, but most are probably closer to weekly or monthly. Too slow to work for Blogs.

I'm keeping my eyes on
Google Labs. Inevitably, there will be a blog search coming along. Won't there?

Friday, May 20, 2005

Text Mining Summit

I'm looking forward to attending the Text Mining Summit in Boston on June 7 and 8. Most of the heavy hitters involved in text mining, like Factiva, will be there (oh and Cingular, Hewlett Packard, Pfizer, EDS, AOL, Abbott Laboratories, In-Q-Tel, IBM, Oracle, ClearForest, Inxight, Cognos, SPSS and SAS) talking about their approaches. Should be a great two days.

Extracting Sentiment

There have been many attempts in the industry to find an way to automatically extract the sentiment (tonality) from unstructured text. I think there will continue to be progress in this area as more and more businesses are trying to find out not just how much is being written about them, but what the tone of the writing is.

Our friends at Intelliseek have an approach, and so do many others.

The challenges of automatically extracting sentiment are many. It is very difficult for computers to extract meaning from running text. Nuances of language, wit, sarcasm, irony are virtually impossible to detect. Even double negatives can make NLP software confused.

Add to that the issue that sentiment is in the eye of the behold. What's good for one company often by definition is bad for its competitors. Is the answer to extract the sentiment of the author? Perhaps.

All of this leads me to believe the only way you can build a commercial software package to extract sentiment is by integrating a human in the loop -- either as a final check or as a trainer of the system.

I'm not quite sure of the details of how that should be done, but I'm pretty certain that the answer lies in there somewhere.

Thursday, May 19, 2005

Wendy's Reputation Struggle

When a Wendy's customer said she found a piece of a human finger in the fast food chain's chili, the company reacted quickly, conducting an internal investigation. It quicked proved to itself it hadn't done anything wrong and offered a reward to find the perpetrator of what we now know was a fraud.

While the company handled the situation as well as it could have and is recovering, it will take a while for its sales to return to where they were.

Even though I know there never was a finger in the chili at Wendy's, the image will stick in my head for a while -- and that might make me (at a subconsious level) avoid eating there. And that's why corporate reputation is so delicate. Wendy's did nothing wrong -- and everything right in the aftermath, but is still suffering. Not fair, but the hard truth.

A well-stated commentary -- See the article in Factiva (subscription req'd) -- by Jack Schuessler, CEO, Wendy's appears in the Wall Street Journal this week. He stated: "The disturbing truth for everyone in the business community is that a devastating fraud can be perpetrated by a single individual. And the ramifications to a company's reputation are frightening."

This is another example of why it's vital for companies to be ever on the lookout for incidents. The faster they react (as long as they react in a way that shows they have nothing to hide) the better it will be for them in the end.

Software tools that allow monitoring of public discussions in the blogosphere and the local media are vital in the process.

Wednesday, May 18, 2005

Blogosphere Timeline

Just somthing I put together, inspired by Gartner's approach to analysing new technologies, to try to see where blogs are on the continuum of growth and acceptance. Using this model, there will likely be a drop-off in popularity in the next couple of years, followed by firm acceptance as part of the mainstream communication technology landscape.

A few good quotes:

“Blogs are the best thing that's ever happened to
journalism. Or they're going to kill it. One or the other.”

-- San Jose Mercury News,
April 18, 2005

“…you cannot afford to close your eyes to [blogs], because they’re simply the most explosive outbreak in the information world since the Internet itself."

-- Business Week
May 2, 2005

Monitoring Blogs

With the volume of unstructured content continuing to grow text mining becomes an obvious way for companies to manage the information being generated about their subjects and their issues.

The growth of the Blogosphere alone creates a new stream of data that many companies are naively ignoring (partially because they're not sure how to monitor them). With stakeholders and journalists reading and writing about your company it seems to me that a company would be foolish to not monitor what's being said.

Monitoring blogs in a comprehensive way allows companies to be able to find dramatic changes in their landscapes.