[FoRK] PersonalWeb patents "unique" hashes of content

Stephen D. Williams sdw at lig.net
Tue Sep 18 23:16:13 PDT 2012

On 9/18/12 7:39 PM, J. Andrew Rogers wrote:
> On Sep 18, 2012, at 6:41 PM, Stephen Williams <sdw at lig.net> wrote:
>> Content addressable storage == hashtable.
>> Not true, hashing is just one way of doing it for simple cases.

I haven't seen anything referred to as "content addressable storage" that didn't mean "run some hash over the whole file and use 
that as the unique ID for that data".  You could add length of the file to increase resilience to collisions.  And there are 
sampling methods that should be good as likely, but not certain lightweight alternatives.  (Have to run an empirical on that soon.)  
You could also run a filter or transformation to find equivalent files with different bytes, but that's just the same thing at a 
different granularity.

What are you thinking of as an alternative to whole-file hash?

>> I have discovered that many patents like this are filed by people with real computer science educations who are completely ignorant of the computer science literature. It is like the recent spate of patent filings around so-called "multidimensional hashing" that were "invented" by people straight out of college. As I pointed out to one such inventor, he did not find it on Google because it isn't called "hashing" and it has been around so long that they stopped publishing papers related to it since before he was born. Of course, the patent office will likely still issue it.
>> It is not malicious, it is the Dunning-Kruger effect.
>> As a remarkable example from my own experience, most claimed "state-of-the-art" spatial indexing systems designed by companies with good reputations are not only not state-of-the-art currently, they would not have been state-of-the-art 20 years ago. But they truly believe that they are. Most software designers never read the literature, and only tiny minority read literature that was written well before the Internet age.
>> Computer science progress is so slow because most computer scientists are busy reinventing things we knew how to do in the 1970s.


More information about the FoRK mailing list