From: Kragen Sitaker (
Date: Thu Apr 26 2001 - 00:25:33 PDT

Ben and I were talking about network protocols on the way home, and we
came up with an idea that seemed so obvious and useful that someone has
surely thought of it long ago, but I don't recall hearing of it.

(Normally I'd post this to kragen-tol, but the machine is down.)

TCP is nice, except that it has the unfortunate property that a single
lost packet can stall the entire stream behind it until that packet is
retransmitted. This is particularly bad for IP-over-TCP tunneling.

So suppose you're transmitting a big stream of data from point A to
point B. You'd like that data to be received fairly promptly the vast
majority of the time; 5% packet loss means that your stream stalls for
an extra RTT every 20 MTUs' worth of data --- typically that's every
29200 bytes or so.

If you're transmitting streaming video or audio, it's OK if some
packets get lost. Your users will be irritated, but they'll still use
it; the Bellheads will still laugh at you and point out that they can
offer better QoS, but you're still providing voice service at roughly
1/20 what they charge for it, so you don't care. So you just use UDP
and forget the damn dropped packets.

Suppose you take a different tack. Instead of transmitting the data
over UDP, you transmit two copies of it over two parallel TCP
connections. You're using twice the bandwidth you were before, but the
other end now only has to deal with stalls 0.25% of the MTUs ---
assuming independent 5% packet loss on both connections. (An erroneous
assumption, but I'll get to that later if I have time.) The recipient
just uses whichever data they receive first.

So now your users only see stalls every half-megabyte or so; the rest
of the time, they're getting smooth isochronous streaming chocolaty
goodness. But they, and you, are paying double for your bandwidth.

Suppose instead that you split your data into blocks, use ECC coding
for each block, and stripe the ECC codes across N TCP connections, such
that the data from any K < N connections is enough to reconstruct the
original data. Whenever your recipient receives enough chunks of a
block that they can decode the original block, they do so. It would
seem that you could decrease the probability of delay to an arbitrarily
low level by increasing the number of ECC bits and correspondingly the
number of connections.

There are several potential problems here:
- increasing the number of connections increases the probability that
  at least one, or two, or three connections will simultaneously suffer
  a delay and retransmission. So you need to add more connections to
  compensate. I think you can make this work out nicely, but I'm not
  yet sure.
- you'll be transmitting N packets at once on N connections; this leads
  to a problem in the middle of the transmission that's similar to the
  one slow-start was introduced to fix at the beginning: you may be
  able to send N packets at once on your Fast Ethernet connection, but
  that doesn't mean the cable modem at the other end can handle them.
  So after some number M of connections, you will end up retransmitting
  every segment on connections M+1, M+2, etc. (I've seen this happen
  in the past.)
- if you're transmitting N times the MTU at once, you don't lose
  anything extra to packet header overhead --- you send N full-sized
  segments over N different TCP connections instead of sending them
  sequentially over the same TCP connection. But if you send data in
  smaller pieces, each connection ends up sending smaller segments,
  which means you spend more of your bandwidth on worthless packet

This scheme could be implemented as a more or less transparent library,
rather like SSL; applications wouldn't have to change. They'd just
work better.

You get all the benefits of TCP --- congestion control, reliable
delivery, etc. --- you just don't have to pay with unpredictable

In environments where you have some control over routing, you can
attempt to get your various TCP connections routed differently ---
e.g. by sending them out different interfaces, routing their first hop
through different routers if you're lucky enough to have access to
multiple routers, or setting different IP flags on the packets in hopes
that routers further down the line will route the packets differently.

So who's thought of this before?

<>       Kragen Sitaker     <>
Perilous to all of us are the devices of an art deeper than we possess
       -- Gandalf the White [J.R.R. Tolkien, "The Two Towers", Bk 3, Ch. XI]

This archive was generated by hypermail 2b29 : Sun Apr 29 2001 - 20:26:10 PDT