A Text Mining and Media Measurement blog from Glenn Fannick, a Director of Product Development Management with Dow Jones & Co.
Tuesday, December 27, 2005
Adweek: 20% of Blogs are Spam
Friday, December 16, 2005
Next step in text mining
KM Chicago is "Factiva-friendly" but not necessarily made up exclusively of Factiva customers. I'd like to thank one of the directors of KM Chicago, Ann Lee, for inviting me to speak. Ann is a colleague of ours in Factiva's Chicago office.
Blogging and your Corporate Reputation
PRWeek UK Looks at the Best and Worst of Corporate Reputation in 2005
Mr. Hill made use of some data our team in London extracted from Factiva Insight. Specifically he shows how Sainsbury's media coverage and its stock price "show remarkable symmetry" during 2005.
Tuesday, December 13, 2005
Text Mining Becomes Sexy
As if things were boring around here, I think Amazon just shook up the world of information retrieval. Its mostly quiet Web-search division, Alexa, is opening the doors to its huge trove of Web-crawled content, allowing text-mining access to the archive. It would seem to me this is text-mining for the little guys -- an affordable way to build applications without having to host billions of documents.
The company plans to make it available at a low price point so that just about any developer who wants to can "search and process billions of documents -- even create their own search engines -- using Alexa's search and publication tools. "
Oh my, GYM just got a wake-up call.
Can little Alexa (with big parent Amazon) do what Google hasn't gotten around to yet or that IBM's WebFountain project has been trying to do for years -- make the Internet one big text-minable database that's easy to use and can produce commerical-grade business information tools? It's too early to tell, but it's all very exciting and should be great watching it unfold.
Tuesday, December 06, 2005
IM2005? Maybe Next Year.

Last week, several of my colleagues from Factiva and I attended the IM2005 awards along with 1,300 of our closest business friends. The intimate affair, held at The Grosvenor House Hotel in London was one of the many awards ceremonies in the IT industry. If you've never been to such a black-tie industry event, picture the Oscars -- but without the celebrities, or the press, or the sexual energy, intrigue, anticipation, dynamism, humor, well you get the picture.
The IMmies (I just made that up) didn't have Billy Crystal, but it did have Barry Cryer, who was said to be back by popular demand. Uh, huh. The IM Web site says he's quite the go-to guy in British comedy, having written for "practically every top U.K. comedian". Maybe his humor just doesn't track well to an American's ears, but I'd have to say some of it sounded more like Borscht Belt, circa 1955, than something written for hip dotcom professionals.
And hip it was. When a project called "Mapping Access Land in England" sweeps the night, winning two awards, you know you're with the "in" crowd.
Oh, did our sleek, hip product, Factiva Insight: Reputation Intelligence win in its category "Product of the Year"? No, but we were up against 29 other products (none of which I'd ever heard of). Hats off to Njini. (Who?)
Well, it is an honor just to be nominated. And the spiced confit of Norfolk duck was good. And I'm usually more of a Long Island duck guy. We missed out on the dancing afterward, as we did split right after the awards portion (16 awards in 30 minutes! definitely not the Oscars) and high-tailed it to Factiva's holiday party. We arrived 4 hours into Factiva's 9-hour fete without a trophy but were welcomed back with glasses raised. We're all winners.
Wednesday, November 23, 2005
Wanted: Corporate Blog Writer
I think this position really works well if the conditions are right: 1) Blogging has to fit the corporate culture (Think: a company which embraces business casual 5 days a week). 2) The company has to be innovating in ways that their potential customers find interesting (who wants to read about chewing gum) 3) The blog does not look, feel or sound like it's written by a committee or by the Marketing Department.
I applaud companies who have tried it as an official position (Stonyfield Farm, GM) or who have allowed it to go on in a more unofficial way (Scoble at Microsoft). I'd like to see more companies take the plunge.
If this is all in place, it sets the company up in a position of thought leadership, drives traffic to its Web site, supports its initiatives and gets the attention of the media. Hey, wait, that sounds like Marketing's job. Maybe so, but at the same time, this could be marketing that works.
Tuesday, November 22, 2005
Score Another one for Joe Q Blogger
This time it was security researcher Mark Russinovich who first reported earlier this month that some of Sony's new music CDs were monkeying around with a Windows "rootkit", helping Sony to prevent copying of the music -- and opening up users' PCs to potential security leaks.
Information Week has a good piece on the news, including this stinging summary:
"Sony made an unpopular product decision and got its reputation incinerated by
waves of flaming bloggers. That's a lesson for other companies."
Sony indeed made a mistake when they tried to brush this off. Why not just take your lumps on day one. "We screwed up, we're pulling the CDs. And we'll make this right." That's the way to go. But during a Morning Edition interview on NPR on Nov. 4, the Sony exec interviewed tried to say that this problem was an esoteric technology thingy that the average person wouldn't care about. "Most people don't even know what a rootkit is so why should they care about it?, " he said. That arrogance hasn't played very well. Hundreds of bloggers have ripped Sony for this. Sony's users might not care about the details of a rootkit but they do care about privacy and their computer's security.
You'd think by now a company with the smarts of Sony would see the potential downside of trying to sneak one by. But oh, the bloggers are watching you...
Marti Hearst: Why are you always No. 1?
Wednesday, November 16, 2005
Google Gives Nod to Taxonomies with Launch of "Base"
ZDNet's take | C|Net's take
Data Mining Helps Find Needle in Corn Fields
Wednesday, November 09, 2005
Even More Text Mining Options with Factiva
The new announcment talks about reaching customers who are already IBM shops and who need behind-the-firewall, customized solutions. So in this way, it fits nicely as another piece in Factiva's overall text-mining picture.
Down Under, Corporate Blogs Might be Seen as Liability
But here's an opinion from Down Under (Factiva subscription req'd) in the Australian version of Computerworld saying: "Corporate blogging in Australia has stalled because of a perceived security threat and a belief by employers that an active blogger is a liability." According Hydrasight analyst John Brand "less than 5 percent of organizations in Australia actively use blogs as a corporate tool, with some blogs creating an IT security risk," Computerworld reported.
This article doesn't say where that number comes from, but it could be interesting if different parts of the world view corporate blogs differently.
Media Monitor Plus Relaunches

Factiva recently re-released one of its media monitoring products -- Factiva Insight: Media Monitor Plus. The release took the product off the 2B legacy system (which we've been running after Factiva's purchase of 2B Media Intelligence this year) and put it on the Factiva Insight Text Mining Platform.
This allows us to offer a broader content set (blogs, boards, Web and mainstream media from Factiva's archive) along with the speed and intraday update features built by our colleagues from 2B.
I haven't worked directly on MMP to this point as most of my time has been focused on Reputation Intelligence, but I really like this product, too. It's taken several months, but the former 2B products are starting to blend more with the Factiva look and feel, which is great.
I'm also starting to learn more about how clients are viewing it. Some of them see it as we expected -- a tool to more efficiently monitor their media coverage. Others, interestingly, see MMP as way to help their teams get up to on a sector or an industry. For example, say Acme Corp. is a consultancy that works with 10 different industries. It's vital for the Acme consultants working in the auto industry to stay abreast of the news and trends in that industry so they can speak intelligently and understand the issues of their clients. With Media Monitor Plus, Acme can set up 10 dashboards, one for each industry, so each group of consultants can stay on top of the big picture in their industry.
Tuesday, November 08, 2005
Study: CEOs find blogs useful
About 59% of CEOs said blogs are useful for internal comms and 47% see them as useful for external communications, reports CNet on a study by PRWeek and Burson-Marsteller.
Monday, November 07, 2005
Factiva CEO Among Finalists
MSM lagging behind the corporate blog story
The local newspapers in the U.S. are still slowly rolling out their: "what is a blog?" articles. Even the Financial Times felt obliged last week to state "...Weblogs, or blogs, ..." And the had the trite lead: "To blog, or not to blog? That is the question vexing marketing managers ...."
The Wall Street Journal* had a good piece on the value of blogs, but they, too, felt obliged to define them in the lead:
"IT USED TO BE rare for an established, mainstream company to buy an
individual's personal blog. Blogs are frequently updated online journals,
written by pretty much anybody -- professionals, hobbyists or regular Joes
reaching out to share their thoughts, information and photographs with
others.
The New York Times for the most part is hip to the blog story. They don't feel the need to define blogs in every article, when the context makes it obvious.
*Full disclosure, my company, Factiva, is half owned by Dow Jones, the publisher of the WSJ.
Thursday, November 03, 2005
Visualizing Complex Data Relationships
Wednesday, October 26, 2005
Meeting Media Monitoring Users
I really feel strongly that market validation is a vital piece of product development. Typically when you talk to users you'll hear feedback that's not unexpected. But what keeps it interesting is that you always get some comment, some new perspective, some advice you weren't expecting.
It seems that for every client, there's a unique use case.
Friday, October 21, 2005
Who Are These Sploggers, Anyway?
I'm not sure we need another word to describe them though. "Sploggers"? Ugh. But from a linguistic perspective, it's fascinating how words form so quickly in cyberspace.
web log > weblog > blog > blogging > blogger > spam + blog = splog > sploggers
Tuesday, October 18, 2005
BlogOn: The Oft-Mentioned Long Tail
Basically, this is the idea that most of the traffic in the blogosphere is coming from a very small number of authors and a very large number of authors (the long tail) are creating on average a small number of posts each.
This idea is tied to Zipf's law, named after George Kingsley Zipf, a Harvard linguistic professor. Jacob Neilsen also recently wrote about Zipf's curves.
However, it was pointed out by presenter David Weinberger that the area under the long tail is larger than the area under the large head, as it were. Which means... what exactly?
BlogOn: Podcasting to Text
I think he's missing the point. Once podcasts are transcribed they can be searched and text mined. This adds additional use to the podcasts that otherwise have a limited distribution. Without being able to search or mine podcasts most of their usage is going to come from browsing and category searching. For example, if I search for "Pinot Gris" in a podcast search engine I will likely miss the podcast that mentions "pinot gris" because the podcast description might not mention specific grapes and wines.
BlogOn 2005: A Diverse Attendee List
The attendees are quite diverse. There seem to be a mix of Blog geeks and newcomers to the space ("what's podcasting?" one attendee asked a panel). One woman I met has never posted to a blog before but was here because her boss instructed her to find out more about the industry.
Many people here seem to be vendors, industry analysts and journalists. And there are a surprising (to me) number of PR, marketing and advertising professionals here. It seems those industries are trying to get up to speed quickly on this growing internet-based conversation is all about.
I think the conference will have to evolve next year to be more useful. The topic of "blogs" is too vague to support well focused show.
Thursday, October 13, 2005
Google and Information Extraction
Google is hiring expert computerscientists and software
developers!www.google.com/jobs
I've never really though about Google being a player in the information extraction sector (aka entity extraction, text analytics, text mining). Sure there's lots of talk about what's the next big thing for them -- free wireless access, video search, indexing the world's libraries. It's fun to think about that stuff.
But when it comes to improving their bread and butter -- search -- mostly I picture their focus being on refining their ranking algorithms and optimizing their crawling strategies. But on your way to being the one-stop shop for all information, I guess it should have been obvious to me that text mining would be a station on that route.
One place we see TM showing up clearly is in Google News, with the "In the News" list of oft-used phrases of the day. I'm sure there are many more examples.
Monday, October 10, 2005
Lies, damn lies and text mining statistics
He proved his claim by running a LexisNexis search on the phrase over five years and said its use is rising every year. ("It was in 3,504 stories in 2004, nearly 700 more than 2000. ")
I found this to be creating truth where none exists by a fast use of text mining. I have two concerns:
1) What were the context of these references? I searched "get a handle" in Factiva and found several mentiones of that phrase in a oft-sited direct quote ("Firefighters were able to get a handle on this early on," said Capt. Jason Neuman of the California Department of Forestry and Fire Protection.) Does that make the phrase more common or is it just a function of the phrase being replicated by the distribution of AP wire copy.
2) Did Mr. Buhler account for any changes in the universe of publications and/or documents over that time period? The number of mentions in one year versus another needs to be compared to the total documents in each year. When I ran the "get a handle" search in Factiva's top 50 U.S. Newspapers (a more controlled group) and then compared it to all documents each year in that group, I found the rate of mentions of the phrase rather flat year on year.
Ah. Lies, damn lies and statistics.
Saturday, October 08, 2005
Google News: A study in text mining
You get the feeling the relationship might be one of smiling through gritted teeth.
Google News is using the power of text mining to leverage the editorial might of many editors and news rooms. CNN, AP, NY Times, FT are all making decisions of which news item is most important and arranging their landing pages accordingly. They're paying human editors to make these value judgments. GN comes along and in the aggregate scoops up all this knowledge (text mining!) and creates a viable competitor for the best news sites out there out of whole cloth.
The irony is that GN needs its news providers for the knowledge of what's most important. So it needs the CNNs of the world to remain successful so they can keep feeding off them. CNN's ad supported model needs the clicks. Symbiotic? Perhaps, but I think Google is benefiting more.
Tuesday, October 04, 2005
More Forum Follow-Up
Friday, September 30, 2005
Blog Readership, RSS Increasing, Forrester Reports
Automated Sentiment Detection
Sarcasm, irony, double negatives all wreak havoc with automated detection.
Much of this work is still at the university level and the papers published in the area focus on trying to detect the opinions of authors in movie reviews, hotel ratings, etc. The recent interest in monitoring blogs is spurring more discussion in the commercial space.
So it's not too surprising that the industry hasn't settled on terminology yet either. I've seen a host of words being used to describe this process of assigning a positive or negative score to an article -- tone, tonality, polarity, affect, sentiment, favorability (or favourability, if you're in the UK), opinion, mood. I generally use the term sentiment because it's had the most pickup.
There are also different types of sentiment assignments. We can talk about it from the perspective of the author or the perspective of the consumer of the information. For example, a hurricane can be written about as a negative event. However, to the construction industry it's a positive event because it means the beginning of a rebuilding boom. It's not clear to me which terms should be used to describe these different perspectives. Is "sentiment" the view of the author and "favorability" the view of the reader? Not sure.
Tuesday, September 27, 2005
European Blogosphere
Monday, September 26, 2005
Causes of Content Chaos: Everyone's a Publisher
I think those of us out here in the Blogosphere are pretty comfortable with the idea that the growth of blogs has created a world where "everyone's a publisher." I'm not a journalist but I can publish my thoughts just as my colleagues upstairs at The Wall Street Journal can. (Factiva is owned by Dow Jones and Reuters and we share some real estate with our parents.) Certainly, not nearly as many people are going to read what I have to say about technology as, say, Walt Mossberg, but in the collective bloggers are becoming a force. Our opinions are being read by others and in the 500-channel world each channel takes up equal space on the dial.
Studies have shown (that's a great phrase when you don't have real metrics) that people are more likely to trust their peers for an opinion than someone in authority (government, corporations, the media.)
So as bloggers become more of a driver of opinion and more of us become bloggers, corporations and governments had better keep watch.
Blog Numbers Likely Inflated by Spam
We keep hearing about the size of the Blogosphere increasing like crazy. Technorati talks about it doubling every five months. However, what's not being said by all the bloggers who are blogging about blogging is that a growing number of these blogs are spam.
We've heard from a company which processes a large number of blog posts that about a quarter of the posts they see are spam.
Have you ever clicked the "next blog" link at the top of this page (and most other Blogger blogs). How many of the blogs you come to are for online casinos or inkjet cartridges. It's really quite amazing how much spam has moved into the Blogosphere in the past few months (my observation, not based on any particular metrics).
I think it would be great if the main players in the industry -- GYM, Techorati, Intelliseek, Six Apart , etc. -- put their virtual heads together to find ways to systematically slow down the wave of spam before we're up to our eyeballs in it. Blogger's flagging (to allow individuals to notify Blogger about objectionable content) seems like a good idea. But it remains to be seen how well it works.
I Think Factiva Matters, Too
I must agree, (but that's why I come to work every day).
Friday, September 23, 2005
Seth Godin
So, it's hard to disagree with him and he's very compelling because he's such a wonderful public speaker. But I think of bit of what he said is overstated. I'm not sure B2B is the same as B2C in this regard, though he said he thinks it is.
Sure, no one ever got fired by buying IBM. And sure, your CEO will listen to you more if you have a consulting report from McKinsey & Co. than from Mike McKinsey LLC (even if both recommend the same thing) so I understand his point there. But I still think that emotional buying is MOSTLY in the realm of consumer products, not multimillion dollar server farms or jet engine parts.
Factiva Forum Part 2
GYM (Google, Yahoo! and Microsoft) has gone a long way in providing us with pretty relevant documents on the first page, but better Web search is only part of the answer. There is plenty of information burried on pages 2 through N that we never see. We have to find ways to get that information into the path of our research.
So companies in the information industry are trying to help their clients get to the answers, not provide them with documents. I mean, no one is really running a search so they can find documents. They're running a search, so they can find answers. Moving technology forward so it can get people closer to the answers is our focus.
We see text mining as a big part of that solution.
I also talked about a few causes of this content chaos.
•Blogs Mean Everyone’s a Publisher
•‘Markets are Conversations’
•More dynamic news cycles
I'll write more on this soon.
A Funny Thing Happened on the Way to Factiva Forum
I was honored to be asked to speak yesterday at Factiva Forum, an executive conference event held high over Times Square, in the Reuters building. This year's NY version of Forum focused on the ever-quickening pace of news and business information and focused on some of the drivers of it -- the growth of business-oriented blogs, RSS feeds, etc.
I spoke on a panel discussing reputation management. I was asked to talk about one of Factiva's products, Factiva Insight: Reputation Intelligence.
I've been involved in the development of this product over the past few years (starting with our relationship with IBM's WebFountain and through our purchase of a company called 2B) and so I'm a bit biased when I say Factiva's is one of the most comprehensive approaches available today for companies to follow their reputational issues -- how they're being covered in the MSM, what people are saying about them on blogs and boards and the differences between the two.
Also on the panel with me were John Neeson, the co-founder of Sirius Decisions and Judi Frost Mackey, the Director of U.S. Corporate & Financial Practice at Hill & Knowlton.
John talked about the results of some studies they've been conducting in this space. Judi presented a real-world example of how a client of theirs (some retailer from Bentonville, Arkansas, I believe) has attempted to revive their sagging public image through a recent media blitz.
I also presented on the subject of "Content Chaos" (see my next post.)
Thursday, September 15, 2005
Text Mining is a Service with Many Uses
But text mining is a service or a process, not a product, so we aren't marketing "Factiva Text Mining" but we are going to look for ways to fold it into products.
Factiva has developed text mining capabilities and built products for media measuring and reputation management. But searching and alerting can benefit too from the philosophy of text mining.
(An aside: Wikipedia needs a better entry for text mining. Volunteers?)
Corporate Blogging Survey
For those interested in corporate blogging, the results of a survey have been posted by a Boston-based, internet-marketing company called Backbone Media. Here's some of what it found.
"The survey respondents indicated that they believe there is a broad array of benefits to starting a blog including: quick publishing, thought leadership, building community, sales and online PR."
"The biggest concern about starting a blog was the time needed to devote to the blog; the next concern was legal liability. A slight majority of bloggers took less than 1-2 months to start their blog after initial management review. ... Once they started, bloggers saw immediate results from publishing content & ideas quickly. Search engine rankings & links results appeared before sales. Overall, thought leadership and idea sharing were the biggest benefi ts for bloggers."
Wednesday, September 14, 2005
Blogsearch from Google -- Finally
But the surprising thing about Google's blog search, to me, is that it's taken them so long to launch it. Those who like to search blogs have been waiting for it, but Google, until now, has offered no real way to search them because a blog's update cycle is so much shorter than general search engines are tuned for.
For now, Technorati and Intelliseek still have an edge in blog search, but they've got to be looking in their rear-view mirror at the speeding bullet coming up behind them wondering how they're going to keep that lead. Innovate and specialize, guys!
Good to see Dave Sifry, Techorati CEO, is taking a "bring it on" attitude.
(BTW, should I read anything into the fact that Blogger's spell-check suggestion for "technorati" is "degenerate"?)
Tuesday, September 13, 2005
Factiva Forum
I will be speaking at the next Factiva Forum, Sept. 22, in New York on what the information professionals need to know about text mining and visualization. It will be a primer but will also go into how these technologies may be impacting their roles in the enterprise. I'll post more as we get closer to the date.
Monday, September 12, 2005
Scoble on Corporate Blogging
Scoble talks about why blogging is very important for those in product development and design -- including that how it allows clients to interact directly with product managers, not indirectly with customer service. He also says that it's vital for companies to be involved in the conversation and be monitoring the conversation because otherwise the story's going to pass you by.
Scoble might be seen as one of the most influential people at Microsoft (if not one of the 10 most well known). Yet he comments in this interview that he's seven levels down from Bill Gates in the corporate hierarchy. Now that's a statement of how empowering blogs can be.
Business Blogs as Next Growth Area
Publishing has always been a way for academics to distiguish themselves among their colleagues. Maybe now we're seeing an easy way for the business world to do something similiar. Of course there are many differences between the two. In academia, your reputation is probably more closely tied to your published works. Your career might depend on whether your papers are seen as well researched or hogwash. Blogs are going to foster the publishing of quick commentary, not the reasoned research. But nonetheless, blogs do offer the ability for those in business to establish themselves as thought leaders.
David Scott talks about how the corporate blog is emerging in 2005 as a growth area. I see corporate blogs currently as a small segment of business blogs. Corporate blogs are likely extensions of the PR department or the CEO's office. They can be useful ways for a company to get their messages out in a folksy manner. But they can also be seen as shameless shilling.
A corporate blog will work will if it addresses some esoteric interests of a company's products or if it furthers a certain image the company is trying to portray, like Stonyfield Farms.
But beware -- bloggers and (likely) readers of the Blogosphere are savvy. A marketing web site that tries to masqurade as a corporate blog will turn people off quickly. Remember the McDonald's Lincoln Fry blog? It was a superbowl ad campaign that McDonald's tried to foster with a fake blog purported to be written by someone who found a french Fry in the shape of Abe Lincoln. Uh huh. McDonald's said the blog helped the campaign last a little longer in the minds of the public. I doubt it did. I think it just made the company lose cred in the Blogosphere. Did they really think they were going to pull some sort of Blair Witch?
Thursday, September 08, 2005
Travel Boards Should Monitor the Blogosphere
It struck me that travel boards would benefit from tools that can monitor the Blogosphere to see what words people are using to describe their vacations.
(It also strikes me that Maine does need a new slogan. "It must be Maine"? What the heck does that mean.)
Spam, Spam, Spam
Spam blogs and fake blogs are starting to spread as rogue merchants try to boost their rankings in traditional search engines and in blog search engines. By creating lots of blogs that link to each other, they are trying to make their blogs seem influential.
Nothing new here. They're using the exact same tactics they used when they created farms of fake Web sites. And, let's face it, blogs are just Web sites.
What this means is that the leading aggregators in the Blogosphere -- Intelliseek, Technorati and others -- are starting to put measures in place to spot the fakes. Dave Sifry of Technorati wrote an oft-sited post.
As the aggregators start to catch the fakes, the fakers will try to out-fake them. The cat-and-mouse game begins
Factiva Insight shows new discussions about Apple phone

Thursday, May 26, 2005
The Genre of Blogs
I've found, in my un-scientific study that most of these "top" blogs are what I'd call pundits (Instapundit, Eschalon, and Daily Dish for example). Often they are focused on politics -- American politics to be specific.
There are a few traditional journalists at the top of the list. And there are the smattering of humor blogs, consumer reviews, IT and music blogs.
But only two or three out of 100 are the classic blog genre -- diaries or journals. The most popular of these right now are Dooce and Baghdad Burning. The former is probably popular because it's a bit quirky and well written. The latter because it's a compelling look at the author's daily life in the center of a war.
However journals are what most people create when they first create a blog. They talk about their lives. Not very interesting stuff to the average reader, unless the author is a very compelling writer or is living a very interesting life.
Monday, May 23, 2005
Feeling a Bit Alone
Part of me thinks this has more to do with the quality of searching across blogs than it does with the availability of the content.
Technorati and Intelliseek's Blogpulse offer pretty comprehensive searchable indicies of the Blogosphere, but I never walk away feeling the most relevant blogs are coming to the top. Bloglines is talking about a new offering in this space. But there is no Google of blogs yet.
Hmmm. Why isn't Google the Google of blogs? You certainly get blog hits back with your regular search, but blogs are real-time discussions and a search engine like Google is mixing them in with all the other sites it's updating. Not sure what the conventional wisdom is right now on how frequently Google reindexes "the entire" Web. Certainly some pages are intraday, but most are probably closer to weekly or monthly. Too slow to work for Blogs.
I'm keeping my eyes on Google Labs. Inevitably, there will be a blog search coming along. Won't there?
Friday, May 20, 2005
Text Mining Summit
Extracting Sentiment
Our friends at Intelliseek have an approach, http://www.intelliseek.com/technology.asp#sentimentmining and so do many others.
The challenges of automatically extracting sentiment are many. It is very difficult for computers to extract meaning from running text. Nuances of language, wit, sarcasm, irony are virtually impossible to detect. Even double negatives can make NLP software confused.
Add to that the issue that sentiment is in the eye of the behold. What's good for one company often by definition is bad for its competitors. Is the answer to extract the sentiment of the author? Perhaps.
All of this leads me to believe the only way you can build a commercial software package to extract sentiment is by integrating a human in the loop -- either as a final check or as a trainer of the system.
I'm not quite sure of the details of how that should be done, but I'm pretty certain that the answer lies in there somewhere.
Thursday, May 19, 2005
Wendy's Reputation Struggle
While the company handled the situation as well as it could have and is recovering, it will take a while for its sales to return to where they were.
Even though I know there never was a finger in the chili at Wendy's, the image will stick in my head for a while -- and that might make me (at a subconsious level) avoid eating there. And that's why corporate reputation is so delicate. Wendy's did nothing wrong -- and everything right in the aftermath, but is still suffering. Not fair, but the hard truth.
A well-stated commentary -- See the article in Factiva (subscription req'd) -- by Jack Schuessler, CEO, Wendy's appears in the Wall Street Journal this week. He stated: "The disturbing truth for everyone in the business community is that a devastating fraud can be perpetrated by a single individual. And the ramifications to a company's reputation are frightening."
This is another example of why it's vital for companies to be ever on the lookout for incidents. The faster they react (as long as they react in a way that shows they have nothing to hide) the better it will be for them in the end.
Software tools that allow monitoring of public discussions in the blogosphere and the local media are vital in the process.
Wednesday, May 18, 2005
Blogosphere Timeline

Just somthing I put together, inspired by Gartner's approach to analysing new technologies, to try to see where blogs are on the continuum of growth and acceptance. Using this model, there will likely be a drop-off in popularity in the next couple of years, followed by firm acceptance as part of the mainstream communication technology landscape.
A few good quotes:
“Blogs are the best thing that's ever happened to
journalism. Or they're going to kill it. One or the other.”-- San Jose Mercury News,
April 18, 2005“…you cannot afford to close your eyes to [blogs], because they’re simply the most explosive outbreak in the information world since the Internet itself."
-- Business Week
May 2, 2005
Monitoring Blogs
The growth of the Blogosphere alone creates a new stream of data that many companies are naively ignoring (partially because they're not sure how to monitor them). With stakeholders and journalists reading and writing about your company it seems to me that a company would be foolish to not monitor what's being said.
Monitoring blogs in a comprehensive way allows companies to be able to find dramatic changes in their landscapes.
Monday, January 17, 2005
Blogs and Categorization
Technorati: Tags
Companies like Factiva discovered more than a decade ago that comprehensive, univeral metadata on content is vital for organizing and managing large bodies of content.
It seems that the blogosphere is no different than any other corpus of unstructured data in this way. It will be fasinating to see it grow as the Web did.
Remember when Web directories were all the rage in the mid to late 90s because this growing thing (the Internet) needed some order put around it? -- I was involved in Dow Jones's take on that (Dow Jones Buiness Directory). We did it because we knew our clients were looking at the Internet and needed advice as to how best use it efficiently.
Now, directories have becoming largely obsolete largely because Google has made searching for information much more efficient than Yahoo and Excite and the others were in the mid 90s.
It seems obvious to me that companies will be stepping up now to put more order around the Blogosphere. Categorization is one of those things.