Re: distributed spiders

From: Eugene Leitl (
Date: Tue Feb 08 2000 - 10:20:08 PST

Sandor Spruit writes:

> Why would you submit the information to a central engine ? I thought
> we'd given up on "central-location-I-control-it-all" and were moving
> into the distributed area. Why not distribute the search engine and
> gradually escalate a search attempt from "local" to "far away" ?
You're absolutely right, and I in fact earlier proposed mating Apache
with locally (plus a mirror of a few neighbour nodes) indexing engine
like htdig or webglimpse, and turn the whole net into one giant
distributed search engine. (There are sure quite a few difficult
issues, like query response times, misuse (queries get amplified big
time, after all), and such, but they probably can be licked).

Can't somebody with some free time on their hands (what a sad joke)
whip up something sexy-sounding at SourceForge, and mobilize a few
developers with a clue? (I'm not a web person, so I can't do this,
even if I had time).

However, changes can be made only slowly. Also, when looking for
support of big search engine operators, it helps not to piss them off
by making them realise that the work they support will help them to
become obsolete.
> I'm wondering for a long time now why this isn't happening in some
> structured way already. No one can keep up with the growth of the Net.
> The results of global search engines get ever more useless. On the
> other hand, local search engines often prove to be very up-to-date.

Yes, and one can probably even translate a database backed site into a
(virtual) document tree. Translate local full text index into database
queries, or something.

> Our "local" and - incidently - very clueful Dutch research network
> SURFnet, for example, has a private engine that indexes all the
> websites of Dutch institutions connected to it. The obvious result:
> very fast, up-to-date and helpful responses.
> Eugene> I mean motivation other than paying them and/or giving them priority
> Eugene> in using the search engine.
> My response would be: the search engines would once again be useful -
> back to the good old days when the Web just started to take off.

It's obvious, but this doesn't provide immediately tangible benefits
to the user other than the warm fuzzies (like running SETI@home, or
ECDL). It only reaches a breakthrough when few 10% of all sites are
running the modified Apache.

