# [FoRK] low probability bits

Stephen D. Williams sdw at lig.net
Tue Feb 24 17:58:48 PST 2009

```Interesting.

How about the following, a refinement of my earlier algorithm:

Parameters:
N desired 1 bits, M desired 0 bits

Zero an array of size N+M bits.
Out of stream of random bits, for each N bits, set bit Array[log2(N+M))
% (N+M))] to 1.

If M < N, reverse sense.
If N is close to M, start with random array and randomly flip the
appropriate number of 1 or 0 bits.  For 49/51, flip one 1 bit to 0 in an
array of 100 bits.  Randomly read Array[x] until a 1 bit is found, then
set it to 0.

A noise factor is needed, definitely for low values of N+M (1+1 for
instance), but probably to some extent in most situations.  Setting a
floor for N+M by scaling partially meets this requirement.  Flipping a
random number of bits according to a distribution might be desired for
some applications.

Assuming no complex need for noise, this results in needing approximately:
N*log2(N+M) bits for every N+M output bits.  For 1% 1 bits, this
requires 7 bits for every 100 output bits.

I was just thinking that compressed data that might be at hand, with a
little cheap decimation, would be a good source for roughly random data,
boosted with a little real random data.  Looks like adding compression
to pseudorandom streams has been looked at:

An interesting problem would be to take pseudorandom data with some
wrong distribution and making it fit another distribution.  Given a
stream of bits, keep running totals and, using more imperfect bits,
arrive at a stream that has a desired distribution along with desired
noise.  All with a minimum of storage, CPU, and spent bits.  More tough,
do so with local detection of "bad bits" that aren't even pseudorandom
based on some unlikeliness measure.

sdw

Rob Harley wrote:
>> You guys are all smart. You figure it out.
>>
>
> Instead of generating a uniform r in [0,1) and comparing with a threshold p,
> start generating bits of r from most significant and stop as soon as
> you can decide.
> This needs at least 1 random bit per output bit (and close to 2 on average),
> but it should be possible to get close to the entropy e.g., 0.92
> random bits per output bit with p = 1/3.
> This is a similar problem to Huffman encoding a stream with rare letters.
> A standard solution is to group the letters into tuples giving a
> bigger alphabet.
> Then do the same trick of locating a random uniform but this time in a
> partition of the unit interval.
> This will effectively give an infinite Knuth-Yao tree.
> If p is rational, some blank letters can be added with the right
> probabilities to make the partition points dyadic,
> and then the tree can be truncated to finite giving a Huffman code.
>
> Example: (p=1/3) write a Huffman code for the 32 5-bit words with
> probabilities from 32/243 down to 1/243.
> Generate up to 8 random bits until you recognize the code word (unique
> prefix oblige) or a blank (skip it).
>
> 000      => 00000
> 0010     => 00001
> 0011     => 00010
> 0100     => 00100
> 0101     => 01000
> [...]
> 1111000  => 11110
> 11110010 => 11111
> 11110011 => blank
> 111101   => blank
> 11111    => blank
>
> Total overkill unless you really get off on saving random bits.
>
> - R
> _______________________________________________
> FoRK mailing list
> http://xent.com/mailman/listinfo/fork
>
```