[FoRK] low probability bits

Rob Harley robert.harley at gmail.com
Tue Feb 24 15:20:23 PST 2009

>You guys are all smart. You figure it out.

Instead of generating a uniform r in [0,1) and comparing with a threshold p,
start generating bits of r from most significant and stop as soon as
you can decide.
This needs at least 1 random bit per output bit (and close to 2 on average),
but it should be possible to get close to the entropy e.g., 0.92
random bits per output bit with p = 1/3.
This is a similar problem to Huffman encoding a stream with rare letters.
A standard solution is to group the letters into tuples giving a
bigger alphabet.
Then do the same trick of locating a random uniform but this time in a
partition of the unit interval.
This will effectively give an infinite Knuth-Yao tree.
If p is rational, some blank letters can be added with the right
probabilities to make the partition points dyadic,
and then the tree can be truncated to finite giving a Huffman code.

Example: (p=1/3) write a Huffman code for the 32 5-bit words with
probabilities from 32/243 down to 1/243.
Generate up to 8 random bits until you recognize the code word (unique
prefix oblige) or a blank (skip it).

000      => 00000
0010     => 00001
0011     => 00010
0100     => 00100
0101     => 01000
1111000  => 11110
11110010 => 11111
11110011 => blank
111101   => blank
11111    => blank

Total overkill unless you really get off on saving random bits.

- R

More information about the FoRK mailing list