[FoRK] PersonalWeb patents "unique" hashes of content
Stephen D. Williams
sdw at lig.net
Thu Sep 20 00:52:32 PDT 2012
On 9/19/12 11:12 PM, J. Andrew Rogers wrote:
> On Sep 19, 2012, at 10:38 PM, "Stephen D. Williams" <sdw at lig.net> wrote:
>>> Space-filling curves were largely never used for indexing for a reason. Hilbert-based indexes were among the best but still not great.
>> I understood that their most successful use was probabilistic clustering of record location for databases. Apparently someone implemented a color space mapper in hardware using the Hilbert somehow.
> Sure, but that was like 20 years ago. The computer science would not support such an argument today. Hilbert curves haven't made sense in years. Even in their prime for indexing, the benefit was on the order of 10-15%.
I wasn't arguing for them per se. They blow up in size far too quickly for any interesting number of dimensions. It's more of a
solution that makes you think of real solutions. I like the interleaved nature of Z curves better.
> My point, long lost, was that I can content address arbitrary polygon data in a 12-dimensional space based on a hyper-rectangle intersection -- not simple equality -- relationship for about the computational cost of a fancy crypto hash while using a thousand node cluster.
> We can call it not true "content-addressing" but if that is not content-addressing then I don't want content-addressing. I want whatever the guys doing non-hash fancy content-addressing are doing. What do you suggest we call it, ignoring that it is literally content addressing?
Take N dimensions, normalize them to have roughly even membership density ranges, then Z-order curve interleave them to the desired
key length, up to full values. Since a 2D Z curve is equivalent to a quadtree, an ND is an Ntree. If you index the keys in order,
you can find all of the keys in the same subspace by observing a common prefix to the desired level. Each "level" is the power of
two resolution of each of the keys at their scaled values. So a key range gives you a rough set, or you could create a set of
ranges for a precise definition of subspace boundaries.
It is a generalization of the quadtree idea mapped into a single dimension key space where nearby keys are in nearby spaces with
only predictable spacial / bitwise relationships. Nice when you always want to search on all of the dimensions in the key, and
easily mappable to a relational database. What happens when you want to wildcard one or more of the dimensions? Seems like you
would have to scan through all of the possible ranges for that dimension.
I would just call it a compound key table method where the N-part keys are normalized and interleaved to get a particular ordering
effect. Unless the location is all that is in the actual tuple/record/object, none of this is affected by the content.
More information about the FoRK