There's been considerable discussion in the blogosphere about Google "dropping blogs" from search results. Dave Winer linked Andrew Orlowski's article about Eric Schmidt's comments; more recently Dave links Evan Williams' reply that Orlowski is full of crap. So what's the truth? Unlike Evan, I have no inside knowledge (Evan is the founder of Pyra, makers of Blogger, which was recently purchased by Google), but here's some educated guesswork...
First, Google is all about delivering accurate search results. If they thought dropping blogs would help, that's why they would do it. (Not because they dislike blogs or have some philosophical axe to grind.) So we need to think about whether blogs improve search results or not. Second, Google has a history of separating search domains in their GUI (images, groups, directory, news). Each of these domains have different characteristics, and when a user searches they generally know which domain they want to search within. It is reasonable to assume that rather than dropping blogs altogether, Google would establish a new domain for them. So we need to think about why they would do this and how it might work. Finally, Google works great for most sites, but the way they index blogs could be improved. So we can think about how blogs could best be indexed.
Dave asked "how will it [Google] tell the difference [between blogs and everything else]"? I'm not sure how they could tell, there are gray lines between news sites, personal home pages, company sites, e-commerce stores, blogs, etc., but there are technical ways to distinguish (blogs ping weblogs.com, they have RSS feeds, etc.). More on this below, but for now let's think about the differences a search engine would care about:
If you think about it these things all make blogs less useful to search engines. Let's consider them in turn:
Blogs' content changes frequently. Blogs are chronological diaries; many bloggers post at least once a day and some post multiple times a day. Each post usually has a "permalink" (a URL which always links to the post), but the blog itself has a constant URL, and the content of that URL is always changing. Consider my little blog; I post about once per day, and Google's spider visits me about once per day. It takes Google some time before their spider's data are indexed and absorbed, so most of the time what Google "thinks" is on my blog's home page is only accurate for a few hours. This is shown vividly by looking at my referer logs; Google often directs people to my home page based on content which is no longer there!
Blogs are link-rich and content-poor. Many posts on a blog simply link to other posts on other blogs, perhaps adding some commentary and/or associating multiple posts with similar content together. Not all blogs are that way - this is the "thinkers" vs. "linkers" distinction I've mentioned before - but overall if Google directs a searcher to a blog, they're more likely to find links than the information itself. There is value in having the links aggregated by the blogger, but that's what Google does anyway. So most blog posts are not very good targets for a search, even if many other bloggers have linked to them.
Blogs contain personal opinion. By their very nature, blogs are one or a small number of people's thoughts about their world. Blogs which blandly report news are uncommon; most blogs are full of philosophy, politics, sociology, and general spin. This is what makes them interesting and fun to read, but it isn't clear this is helpful for someone searching for information. If you are searching for "George Bush landing on the U.S.S.Lincoln", that's what you want to find, not 1,000 bloggers' personal opinions about George Bush's landing.
So I can see why Google might want to exclude blogs from search results. By the same token, blogs have information that can't be found anywhere else; they are an incredible source of information. The information takes several forms:
So I can see where Google would definitely want to continue presenting blogs' information, but segregated into a different search domain. They would do this for another reason, too - to improve the presentation of results. Google News results are different from Google Web results, and they are presented differently too, as a reflection of the underlying differences in the content.
There is no doubt Google's approach to indexing web sites made a qualitative improvement in web searching. But there are ways blogs can be indexed which would be a big step forward:
No doubt there are other ways, too. By segregating blogs and treating them differently, Google could improve the blog searching experience. Which in turn would make the information on blogs more valuable.
Wrapping up, here are my conclusions:
Those are my thoughts, I'm sure you'll have others. I'll search for them :)
P.S. Click here for a Technorati search for blogs which link to Orlowksi's article. There are 195 listed, each of which has other inbound links, comment threads, trackbacks, etc. Amazing!