Wednesday, August 16, 2006

2.27 gigabytes of AOL data provides treasure trove of data mining

A Lee Gomes article in the WSJ.com (sub.) demonstrates a fun use of data mining, sources from recently released AOL search files, to show what people are searching for on the Web.

"... excepting prepositions and conjunctions, the most commonly used word in the 17.15 million separate searches was 'free.' If something isn't free,
it better at least be 'new,' as that was the next-most common word. Excluding proper nouns, the next most popular words were 'lyrics,' 'county,' 'school,' 'city,' 'home,' 'state,' 'pictures,' 'music,' 'sale,' 'beach,' 'high,' 'map,' 'center' and 'sex.' "

2 comments:

Anonymous said...

anyone know where I can get this 2.27 gb of data?

Glenn Fannick said...

AOL is no longer distributing it, not surprisingly. But chucks of it are swirling around the Web and it's been mirrored at least one place. News.com has an interesting article about the more interesting groups of searches.