Editorial on Ellison, McNealy, and national ID card

Robert S. Thau rst@ai.mit.edu
Thu, 25 Oct 2001 17:08:59 -0400 (EDT)

Clay Shirky writes:
 > I now think the answer is 2**34 < GUID < 2**40, which is to say,
 > enough to hold 10 billion GUIDs comfortably. (2**34 is 17 billion, a
 > bit of a squeak, while 2**40 is a trillion and change.) This would be
 > 5 bytes, in other words.

If hashing is involved, choosing too small a GUID nearly guarantees
you problems with collisions (two sets of DNA indicia hashing to the
same code, by sheer dumb luck --- likely bad luck for at least one of
the individuals concerned).  

If we assume ten billion individuals, and 2^40 possible GUIDs, one in
a hundred must be valid.  I'll leave it to someone else to say how
good a hash function must be to avoid collisions in a space this
packed, but it certainly doesn't give me the warm fuzzies.  2^80 (ten
bytes) would be a lot more comfortable, and I can't think of any
applications in which the extra expense would be prohibitive.
(Including big-brother databases, BTW; name, date and place of birth
alone substantially exceed the extra five bytes).