[FoRK] Should we add some ads to FoRK-archive? At 60K hits/day...

Lucas Gonze lgonze
Wed Oct 12 14:39:50 PDT 2005


Adam L Beberg wrote:

> 30-40% is really low on the bots/harvester rates for most of my sites.
> Google alone is a very hyperactive visitor accounting for over 5% of 
> traffic.
>
> It would be VERY interesting if someone were to actually pick part 
> logs carefully and study this. I can imagine that blog-class sites are 
> getting a huge fraction of their hits from search engines.
>
> More hits more ads, more ads more money, more money more crawling... 
> wait a second...

You never base your evaluations on raw hits!  Ever, ever, ever.

IME bots account for up to 50%.

An interesting data point about bots: I changed my A record recently, 
and found that the bots kept hitting the old IP for 2-3 days after the 
desktops had switched over.  This could be a good heuristic to 
distinguish bots from live users, which is pretty damn hard otherwise.

A way you might take advantage of this is to continuously cycle through 
IP addresses, with a cycle latency of 3 days.  The bots would filter 
themselves out of general traffic, allowing better performance for the 
humans.

Another way to take advantage of it is to continuously cycle, but use 
hits on the stale IP as indicators that a client is a bot.  You record 
characteristics of the bot like IP and User-Agent, then look those up 
over on the fresh IP when you need to figure out whether a client is a 
bot.  This would be useful for a tool to compute human traffic scores; 
those human traffic scores would be good for presenting audited numbers 
to potential advertisers.






More information about the FoRK mailing list