[FoRK] PersonalWeb patents "unique" hashes of content

J. Andrew Rogers andrew at jarbox.org
Wed Sep 19 18:42:36 PDT 2012

On Sep 19, 2012, at 4:42 PM, Stephen D. Williams <sdw at lig.net> wrote:
>> A content-addressable store uses some subset of the content to generate a key that can be used to address its storage location. The only reason hashing is ever done at all, particularly given the disadvantages, is to flatten key distributions.
> If you use a subset of content to generate a key, it is a just a database.  If you use all of the content to generate a highly unique key, it is a content-addressable storage database.  This is part of the definition of "content-addressable storage". [1]  Note that the semantics attached to that term are different from the term that would be the equivalent for RAM, where Content-addressible Memory [2] indicates a search-like "contains" relationship.

I think you are conflating "definition" with "typical implementation".  

A 128-bit key has the same information with respect to the underlying content regardless of the algorithm used to generate it if it was constructed using at least 128-bits of content. The transform used to construct that 128-bit key is not material except to the extent the choice of transform impacts efficiency of the implementation. The only requirement of the transform is that it produces the same key every time it is given some content as input.

Strong hash functions are one type of transform capable of generating effective, collision-resistant keys from content with the required bits of information. They are not the only such transform. 

> The point of a Hilbert curve based index is to tend to put points that are close together in N-dimensional space, close together in 1 dimensional Hilbert space.

Space-filling curves were largely never used for indexing for a reason. Hilbert-based indexes were among the best but still not great.

But since we are talking about content-addressable storage rather than indexing, Hilbert-like models are poor relative to some other curve-based schemes you could choose to use. The computation is only vaguely related to the kinds of computation you would use for an index; what makes Hilbert good for indexing has no bearing for content-addressing.

More information about the FoRK mailing list