Re: Real-time distributed Web search (Gnutella knockoff?)

Date view Thread view Subject view Author view

From: Meltsner, Kenneth (Kenneth.Meltsner@ca.com)
Date: Fri Jun 30 2000 - 18:49:45 PDT


There was some good work on this by the Hyperwave folks in Austria.
Basically, in order to assure complete propagation of update messages
without requiring nasty full-blown broadcasts, they send the update messages
to a couple of servers, which send them to a couple of servers each, and so
on. Gross simplification, of course, but they showed both the math and some
practical experiments in the n=thousands range that this works well. Don't
know if their ideas scale into the hundreds of thousands or millions of
computers, though, but it did handle the "natural" hierarchies imposed by
network-network connectivity as well.

http://www.iicm.edu/jucs_1_2/a_scalable_architecture_for/html/paper.html

"This paper presents a scalable architecture for automatic maintenance of
referential integrity in large (thousands of servers) distributed
information systems. A central feature of the proposed architecture is the
p-flood algorithm, which is a scalable, robust, prioritizable, probabilistic
server-server protocol for efficient distribution of update information to a
large collection of servers."

Ken Meltsner

"Adam L. Beberg" wrote:

> On Fri, 30 Jun 2000, Kragen Sitaker wrote:
>
> > Well, you definitely have more experience with building large-scale
> > distributed systems than I do. :) More, actually, than almost anyone
> > does. Could you take the time to explain, in short words so we can
> > understand, how familiar rules of hierarchy and specialization don't
> > apply?
>
> I'll take that as a compliment, I think. Hmmm, I can't take some time,
> but i'll do it anyway, tho I'll try not to spoil my TWIST topics for all
> the FoRKies that will be there. Won't use small words tho ;)
>
> Hierarchy and specialization are how humans cope with the universe, so
> that's how they design things. You report to your boss, and goto a
> doctor when you're about to die.
>
> In a hierarchy with N systems you basicly have N-1 connections going,
> but since they are localized the load on the "big net" is more on the
> orders of sqrt(N). Now, the web is not quite a heirarchy, but in the old
> days of proxy-caches, before Akami and friends broke them, before
> dynamic content was everywhere it didn't belong, it was reasonably
> close. This of course is the bandwidth conserving, efficient, fast zoom
> zoom case. The internet itself is organized this way - as are all
> infrastructure grids - backbone, reginal hubs, ISPs, you. M-bone used to
> work this way too, dynamic hierarchy, which could have handled that
> victoria secret show without even a netblip.
>
> In a specialized system like google, you have a central specialized
> system handling all N connections will low load per node. It's a
> bandwidth hog to a point, but it's still very efficient since you dont
> do any caching anyway. This is where the web is now, since every page I
> visit refreshes every time I move the mouse. This is probably where
> "real world" things will stay basicly forever, it's not perfect or even
> decent, but it is optimal for the advertisers.
>
> And in so called "distributed" (used in its buzzword form) systems like
> Gnutella, freenet, which are actually _broadcast_ systems, where N nodes
> give you more on the order of N! connections. Fine for small N, but
> quickly exploding to a molten mess of bits. This is probably best known
> as the "well it worked in the lab" case, but usually by more obscene
> names. USENET works this way, everything gets copied 50,000+ times no
> matter if anyone wants it or not. You really only need to do this if
> you're pretending to hide from people to break laws, otherwise this
> method is just too stupid and wasteful. Hopefully datahavens will allow
> people to stop thinking that this is a good idea at all.
>
> Guess which category a "distributed search engine" is in :)
>
> Now in a true distributed system (non-buzzword), everything is dynamic,
> and general purpose. The ideas of "here" and "there" no longer even
> apply, as the ideas of "me" and "not-relivant" emerge. You basicly have
> network goo, with no rules, no hierarchy, no specilization.
>
> The system has to be intelligent enough to form internal heirarchies,
> specializations, and other global optimizations on its own, and on the
> fly, and do it all well enough that the thing doesn't melts. Since
> people cant even teach stoplights to coordinate traffic intelligently
> (take the bus!), distributed systems are still considered somewhat
> tricky to do right.
>
> Almost everything in Cosm is in the first category, but I'll eventually
> find a way to fix the stuff still in the second too, if it's possible
> before I have to give up and get a day job.
>
> As a related mini-bit, even Oracle only kinda-sorta-cross-your-fingers
> has distributed databases working after trying for decades. It's
> non-trivial, if even possible at all. And since a distributed search
> engine is really just a distributed database...
>
> - Adam L. Beberg
> Mithral Communications & Design, Inc.
> The Cosm Project - http://cosm.mithral.com/
> beberg@mithral.com - http://www.iit.edu/~beberg/


Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Fri Jun 30 2000 - 19:06:36 PDT