[FoRK] Re: [p2p-hackers] Simple lightweight DHT (fwd from bryan.turner@pobox.com)

Eugen Leitl eugen at leitl.org
Fri Jan 14 00:05:57 PST 2005

----- Forwarded message from Bryan Turner <bryan.turner at pobox.com> -----

From: "Bryan Turner" <bryan.turner at pobox.com>
Date: Thu, 13 Jan 2005 21:31:37 -0500
To: "Peer-to-peer development." <p2p-hackers at zgp.org>
Subject: Re: [p2p-hackers] Simple lightweight DHT
X-Mailer: Microsoft Outlook Express 6.00.2800.1437
Reply-To: "Peer-to-peer development." <p2p-hackers at zgp.org>


    I've implemented four DHTs to date (Chord, Kademlia,
Continuous-Discrete, and a proprietary one).  I'm not sure there's much of a
'story' to it, but here are some excerpts of my trials..

Chord [1]

    Conceptually, Chord is very simple and that is one of the factors
motivating our decision to build it.  We had a basic Chord running between
three nodes in about a month.  This included a 256-bit keyspace (SHA-256),
recursive lookup, standard finger & successor tables, and a very simplistic
stabilize protocol.

    After scaling to a grid of 72 machines, we had some difficulty
maintaining the stability of the ring.  Our node loss event was updating
only the predecessor and successor (to save broadcasting to all fingers,
etc).  This, and the basic stabilization protocol (trade successor lists
every 10 sec with predecessor) did not keep the ring stable.  We adjusted it
to broadcast the node loss and fixed the problem - I still hold that this
should not be necessary for stability.

    We also had routing problems during churn.  It seems that some nodes
would 'skip over' the destination node, sending the lookup further around
the ring than necessary.  This would cause lookups to run around the ring
multiple times before landing at the right spot.  I think we finally tracked
this down to new nodes joining, and somewhere in the process of updating the
finger tables there was a period where routes were allowed to jump too far
in the ring.  We eventually fixed this too.

Kademlia [2]

    By this time I was recognizing some of the same problems that the
Kademlia group had noted with Chord; non bidirectional links,
uni-directional routing, static stabilization rate, and a host of problems
if you threw a knife switch between two halves of the network.

    The switchover to Kademlia from Chord was almost trivial, about two
weeks of work.  All of the structures are essentially the same, requiring
only an extension of the finger table to hold multiple entries and
timestamps.  Lookup, routing, etc.. was already written to be flexible so
our metric was the only thing that changed.  We did not implement the alpha
parameter, nor any of the system-level features like 24-hour lifespans, etc.
These did not fit the project.

    Kademlia introduced no additional problems, although it was much more
difficult to explain how data got grouped together & replicated when
tech-savvy customers started probing.  Kademlia worked fine for quite some
time, and it was the protocol for which our UDP stack was built (see a
previous posts for that design).

Voronoi Diagrams [3,4,5]

    At some point, I realized Kademlia still had not solved the knife-switch
problem.  This is very important to our project, as the installations are
known to be geographically diverse and the WAN connections to remote sites
are flaky at best.  Also, there are large differences in node's
computational and memory resources which were not being considered.

    After reading all the papers under the Continuous-Discrete chain
[3,4,5], I must say it blew my mind.  Unfortunately we were not in a
position to use or implement all of their ideas.  Also, these papers are
very math heavy and took awhile to digest.

    The switch from Kademlia to 1D Continuous-Discrete is essentially the
same as from Chord to Kademlia.  The protocols are almost exactly the same,
but Continuous-Discrete is significantly more flexible.  For instance, there
is no restriction to 2^k link table, the algorithm works just fine with 2,
or any number, or a different number per node, so long as you always
maintain the neighbor links.  Fault-tolerance is a trivial extension to the
protocol where nodes 'overlap' each other in the ID space, and this can be
an arbitrary overlap unlike Kademlia/Chord.  Kademlia's alpha parameter is
also mapped to an arbitrarily wide path through the ID space.  It is also
stable without broadcasting node loss events using a simple suspicion event
(lookups are not routed to suspected nodes).

    Basically take any fixed, rigid features of Chord/Kademlia and make them
flexible, you get Continuous-Discrete.

    I've been happy with this design for about a year, generalizing some
parts of the system and finally fixing the knife-switch problem using a
proprietary protocol built into the now-upgraded 2D Voronoi DHT system.

    Since our application only uses the DHT as a session set-up, we are not
too worried about the lookup times and stabilization times, as long as each
lookup does eventually succeed (or properly notify with a failure).  To
date, our system has been tested with 50 nodes on a dedicated grid at full
speed, saturating 100 Mbps links for over 36 hours.  This was a fairly small
test, with only 50 keys per node, to test collision and caching.  Total
development time has been about 2 years, although less than 6 months of that
is in tuning/hacking the DHT protocols.

bryan.turner at pobox.com

[1] http://www.pdos.lcs.mit.edu/papers/chord:sigcomm01/chord_sigcomm.pdf
[2] http://citeseer.ist.psu.edu/529075.html
[3] http://www.wisdom.weizmann.ac.il/~naor/PAPERS/dh.pdf
[4] http://iptps03.cs.berkeley.edu/final-papers/simple_fault_tolerant.pdf
[5] http://citeseer.ist.psu.edu/562059.html

----- Original Message ----- 
From: "Michael Parker" <mgp at ucla.edu>
To: "Peer-to-peer development." <p2p-hackers at zgp.org>
Sent: Tuesday, January 11, 2005 12:32 PM
Subject: Re: [p2p-hackers] Simple lightweight DHT

> I was about to ask a similar question... You're a seasoned peer-to-peer
> developer -- for those of us who are developers just starting out, and
> perhaps trying to invent our own new, novel topologies and systems, what
> are the hardest things to 'get right' in a peer-to-peer system?
> I've read the paper "Designing a DHT for Low Latency and High
> Throughput" [1] by the Chord group at MIT. It seems to sum up pretty
> well what their challenges were, and what practices they found were best.
> I know there are some other people out on this mailing list who could
> answer this too (Clarke, Freedman... if you feel that your name should
> be on this list, just respond). Sorry to try and drag you into the
> spotlight, but you definitely have a captive audience ;)
> - Michael Parker
> [1] http://citeseer.ist.psu.edu/dabek04designing.html
> Zooko O'Whielacronx wrote:
> > On 2005, Jan 10, at 20:00, Sean C. Rhea wrote:
> >
> >> I wrote the Bamboo router and got it working in about a week,
> >> although I had the experience and code base from writing Tapestry
> >> (another DHT) before that, so I had a bit of a head start.
> >>
> >> It took me another year to get Bamboo to perform as well as it does
> >> today.  I believe the Chord people had a similar experience.
> >
> >
> > What a fascinating story!  What things did you have to learn and
> > invent during the course of that year to improve the performance of
> > Bamboo?
> >
> > Regards,
> >
> > Zooko
> >
> > _______________________________________________
> > p2p-hackers mailing list
> > p2p-hackers at zgp.org
> > http://zgp.org/mailman/listinfo/p2p-hackers
> > _______________________________________________
> > Here is a web page listing P2P Conferences:
> > http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences
> >
> _______________________________________________
> p2p-hackers mailing list
> p2p-hackers at zgp.org
> http://zgp.org/mailman/listinfo/p2p-hackers
> _______________________________________________
> Here is a web page listing P2P Conferences:
> http://www.neurogrid.net/twiki/bin/view/Main/PeerToPeerConferences

p2p-hackers mailing list
p2p-hackers at zgp.org
Here is a web page listing P2P Conferences:

----- End forwarded message -----
Eugen* Leitl <a href="http://leitl.org">leitl</a>
ICBM: 48.07078, 11.61144            http://www.leitl.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
http://moleculardevices.org         http://nanomachines.net

More information about the FoRK mailing list