[FoRK] Should we add some ads to FoRK-archive? At 60K hits/day...
Wed Oct 12 14:39:50 PDT 2005
Adam L Beberg wrote:
> 30-40% is really low on the bots/harvester rates for most of my sites.
> Google alone is a very hyperactive visitor accounting for over 5% of
> It would be VERY interesting if someone were to actually pick part
> logs carefully and study this. I can imagine that blog-class sites are
> getting a huge fraction of their hits from search engines.
> More hits more ads, more ads more money, more money more crawling...
> wait a second...
You never base your evaluations on raw hits! Ever, ever, ever.
IME bots account for up to 50%.
An interesting data point about bots: I changed my A record recently,
and found that the bots kept hitting the old IP for 2-3 days after the
desktops had switched over. This could be a good heuristic to
distinguish bots from live users, which is pretty damn hard otherwise.
A way you might take advantage of this is to continuously cycle through
IP addresses, with a cycle latency of 3 days. The bots would filter
themselves out of general traffic, allowing better performance for the
Another way to take advantage of it is to continuously cycle, but use
hits on the stale IP as indicators that a client is a bot. You record
characteristics of the bot like IP and User-Agent, then look those up
over on the fresh IP when you need to figure out whether a client is a
bot. This would be useful for a tool to compute human traffic scores;
those human traffic scores would be good for presenting audited numbers
to potential advertisers.
More information about the FoRK