Ten million indexed blogs = ??% of the blogosphere ?
Steve Rubel points to a BlogHerald piece by Duncan Riley hinting at the overall size (60 million) of the blogosphere by aggregating individual countries and services' numbers. To contrast Perseus recently published their own number: 31.6M.
Duncan's country/services numbers seem to make sense, except maybe Korea's with a 15 million (i.e one in every 3 habitants, but the number is reportedly substantiated in the comments), and Europe's which is at least off by 1M (France alone has an estimated 2.5M blogs, of which Skyblogs represent 2.1M). However I think that associating services like LJ, TP, MSN Spaces or Blogger to the Anglosphere (accounting for 60% of the total) might be introducing an double-counting in the calculations, since Blogger and Spaces are also used by an international, non-english, audience.
Unless Duncan is saying that Korean and Chinese bloggers are using local services for an additional 24M. I also wonder how many of these could be construed as spamblogs, fake link blogs, etc. and how many are inactive (like my 10 test blogs).
The second question relates to the percentage of blogs actually indexed by services like Feedster, Technorati, BlogPulse and PubSub. These last three have recently hit the 10M mark (which is a notable milestone). I remember a conversation with Dave Sifry about the size of the blogosphere, and he thought that the Technorati index was a pretty good proxy for the number of "real" blogs (i.e they reportedly work hard to remove spam blogs from their index - like all other services). The issue is that figuring out that a blog is pure spam can sometimes be challenging in English, but in Chinese or Korean ?
Since all services used ping services like Weglogs.com, Ping-o-Matic or FeedMesh and link crawling to discover new blogs & feeds, I am wondering how the 16% ratio can be rationalized. Obviously a blog that is not linked and does not update can not be discovered, and there might be a lot out there, but what else ? Spam blogs and inactive blogs, like I said ? Completely disconnected islands of blogs that don't link or ping ?
Also, there is a difference between the number of blogs, and the number of feeds out there. For example this blog has 6 feeds: 2 RSS, 2 Atom, and 2 FeedBurner-managed (1 for blog content, 1 for blog content+Buzznet pics+del.icio.us links). The doubling of (TypePad) RSS/Atom feeds comes from the fact that I have moved my url from softtechvc.blogs.com to blog.softtechvc.com, and did not ask you all to switch to the managed feed.
Having said this: Dave referred to 10M blogs for Technorati, and Bob to 10M feeds for PubSub. Feedster is indexing feeds, and BlogPulse is referring to blogs.
Apples vs. Oranges ? And what is the size of the bag ?
It will be interesting to read what Bob Wyman, Dave Sifry or Scott Rafer's take is on this (or the editor of the BlogPulse blog, who I don't have the pleasure to know).
Update: Natalie Glance from BlogPulse points (in the comments) to a WSJ article on the subject. After discussing what is a blog and what is not, the writer mentions a few interesting bits of statistics:
- 2/3 of blogs indexed by Technorati are non-English
- BlogPulse projects that 50% of MSN Spaces are "empty" (i.e they contain lists, pictures, etc.) but no posts
- Technorati sees 800 to 900K posts a day, vs. 350 to 450K monitored by BlogPulse
- BlogPulse estimates that there are 3.5M active blogs (i.e that have been updated less than 30 days ago).
I will also mention to Pierre Bellanger, the CEO of Skyrock that they might want to start notifying ping servers (actually, I know that the PR team will pick this up).



Technorati indexes weblogs (HTML) and not exclusively feeds, meaning your blog's 6 feed flavors count as one source at Technorati.
It's also worth mentioning some weblogs and other web pages do not want to be indexed and set their robots.txt or meta robots preferences accordingly. When you setup your TypePad blog you are asked if you would like to publicize its existence, and other services like MySpace and LiveJournal have similar publicity settings.
You are correct, we are not discovering them all. Technorati recognizes international weblogs: some members of the Technorati 100 are written in Arabic and Korean.
Posted by: Niall Kennedy | May 25, 2005 at 01:40 PM
I no longer read the "# of blogs" figure as an indicator for much of anything, due to the multitude of problems you've outlined rather well here.
Now a figure based on polling a population with the question "Do you blog?" is something I'd like to know.
Posted by: Gabe | May 25, 2005 at 03:26 PM
BlogPulse also indexes by weblog not by feed. Even if a weblog has multiple feeds, its posts are only indexed once.
We also use the ping services to discover new weblogs, among other methods. Not all weblog hosts make it easy for bloggers to turn on pinging. For example, France's Skyblog.fr does not support either pinging or feeds. As a result, Skyblog.fr blogs are largely invisible.
See today's article in WSJ Online by Carl Bialik on the difficulties of counting the number of weblogs.
Posted by: Natalie Glance | May 26, 2005 at 07:52 AM