Google censorship (period)
Thu, 4 Apr 2002 13:15:32 +0200 (CEST)
On Thu, 4 Apr 2002, Kragen Sitaker wrote:
> Do you mean because you would pay those 10^3..10^4 boxes for their
> localindex.bz2 files?
No, picking up a few (100 to 1000) compressed fulltext indexes is cheap.
I'd like to see a digicash based load levelling scheme for queries,
though, as amplifying query packets would temporarily put a demand spike
on my local (DSL or cable, dialup users wouldn't be able to amplify a lot)
line. Whover would pay the most for the queries would get higher priority
treatment. Of course, once I need the bandwidth for myself I should get
the highest priority, since those are my resources, after all, paid for in
real money. The more mojo I've earned that way, the more I can afford to
spend for my own queries.
> How do you get query fanout? Are you suggesting that my
> localindex.bz2 should contain the localindex.bz2 of everybody I've
> pulled from?
No, that would be a second index, aggregated from the nodes I spidered
(preferably, with a notify mechanism, as this would eliminate polling).
Necessarily, there would be overlap/redundancy.
> Local network neighborhood doesn't give you exponential fanout, since
> the folks in your neighborhood are mostly in each other's
I'm not particular to the neighbourhood, it's just where there's the least
amount of bottlenecks, and it goes easier on the ISP (internal network
traffic is cheap, since not requiring peering arrangements). A fair
fraction of spidering should assume an equipopulated address space.
> Dave Winer's XML-RPC push-based search engine interface doesn't seem
> to have caught on in the last three years, unfortunately. Maybe this
> would help.
XML-RPC could do it, but since we're mentioning low level stuff like UDP,
such niceties are facultative. But, sure, it would be the most painless
way to implement it.
> Ranking is especially important for unpopular search topics.
Yes, but here you can go through hits exhaustively. Also, algorithms
improving ranking for less popular items could be added on later.
> Sorry --- I wasn't trying to get into a dick size war. No doubt there
> are people on FoRK who have been thinking about it since 1980.
No doubt. Too bad worse-is-better wins as default, but just being there