Re: Linkology proposal for SPKI/SDSI

Carl Ellison (cme@cybercash.com)
Mon, 21 Apr 1997 15:55:03 -0400


-----BEGIN PGP SIGNED MESSAGE-----

At 12:57 PM 4/20/97 EDT, Ron Rivest wrote:
>
>This note gives some thoughts on links, with a specific proposal for a
>better way of handling them in SPKI/SDSI. I was motivated in part by
>my reading of the XML document
> http://www.w3.org/pub/WWW/TR/WD-xml-link-970406.html
>on linking, although what is given here is much simpler and directed
>specifically at SPKI/SDSI. Comments and discussion invited.
>
>At the moment we have s-expressions containing subexpressions like
> (hash md5 {...})
> (hash md5 {...} <url>)
> <url>
> (ref <key> name)
>this is ad-hoc, and also fails to make a clear distinction between hash-values
>and links, between the pointers and the thing pointed to.

[etc.]

Ron,

I spent lunch today going over your full earlier message and have a mixed
reaction. I can understand that there is a real problem which both you and
W3C are trying to solve. On the other hand, I believe our corner of the
world is significantly simpler and that our data structures can and should
reflect that relative simplicity.

As I said in my earlier reply, every time we use a hash, it is a link.
Let me formalize that remark here.

The links you are concerned about tell you how to get to some data. That
data might be a file on the web, or it might be some portion of a file. [I
remember proposals in W3C DSig which were concerned with identifying
portions of the text of one file.] Your link construct also addresses the
quoting problem. E.g., when you speak about "http://www.clark.net/" are
you referring to the page at that address or to the 21-byte character string
itself? A link might refer to a cluster of files (e.g., an HTML page and
all the inline images). It might want to refer to a file and files that
file points to, down some depth. This is the issue we decided not to
address when I brought it up in Memphis.

These are important issues to resolve. Finding a good way to refer
unambiguously to such things would doubtless be a Good Thing. Of course, I
am a little suspicious of academic work which isn't in response to some
crying need by the user community, and I haven't heard cries of pain in this
area, but I'm a big fan of pure research and if the specification and
quoting problems can be solved in a simple mechanism, then I think the world
will have gained.

For the purpose of digitally signed things, like certificates, there is an
additional issue. Our links to objects need to be cryptographically secure.
Therefore, there must be a secure hash of the intended object within the
body being signed.

The hash has an interesting property. It's necessary for our use of links,
but it's also sufficient to some extent. That is, there is a procedure for
finding which contiguous range of bytes inside which existing object is
referred to by the hash. The problem with that procedure is that it is a
little inefficient to scan the entire web and compute hashes of every
range of bytes. :)

The question, to me, is what needs to go inside the signed body of a
certificate. I believe the answer is clear. By policy, we expect a requester
of access to deliver to the server all the information that server would
need to evaluate the request. If a requester has delivered all necessary
objects to the server, then the set of objects to hash and locate by hash
is much smaller -- small enough that indexing by hash is probably
the most efficient. This makes the hash of an object the preferred
link, not just for security where it must be used, but also for
performance. Therefore, I expect a requester to send along objects,
some of which are certificates, some keys and some other things.
All would be hashed and hung off a hash table. If an object is big enough,
the requester might send along instructions for accessing the object instead
(letting the server operate with the same network traffic, without involving
the requester). However, none of those things -- objects or little programs
to drive the server in accessing objects [which is what I believe these link
proposals will end up being] -- needs to be inside the certificate. The
certificate must contain hashes and doesn't need to contain anything else.

I am especially opposed to the idea of a #include for certificates.
If by (grab) you meant that a byte string being handed to a hash function
should be interrupted at that link while the linked object's bytes are
funneled to the hash, then I oppose the idea of (grab). It is a possibly
less than fully rational prejudice of mine that the only thing which should
be handed to a hash function for signature verification is a contiguous
set of bytes which arrived from the signer. I do not believe in "some
assembly required", re-canonicalization or "batteries not included".
The signer had all the bytes together to hash and then sign, and should
send those same bytes to anyone who wants to verify the signature. This
is why I keep insisting that the real transport mechanism be the
canonical form. [PEM certificate verification required re-canonicalization
and I got bit by it, since PEM used X.509/ASN.1/DER and expected DER to be
true to its promise that there would be only one encoding of a given
thing -- but that wasn't true. They had overlooked one little thing
and that was enough to keep PEM from accepting some valid certificates.]

I see the specification of links (eventually satisfying the desire to
identify individual sentences within a document, perhaps) as similar to the
design of a programming language. I see this being as drawn-out and
emotionally charged as all new programming language designs have been over
the years. Perhaps my great grandchildren will see consensus reached -- or
maybe not.

I do believe the work is valuable and that someone needs to address it.
However, I believe it is not relevant to the structure of a certificate
itself. For our purposes, a cryptographic hash provides both unambiguous
links and efficiency (given our assumptions).

It also solves the quoting problem. That is, the hash of the 21-character
string "http://www.clark.net/" is different from the hash of the page at
that location. In general web reference terms, where no hashes are
involved, there is a difference between a URL itself and the page to which
it points and one might want to refer to the latter even without knowing
what is at the latter (therefore, without being able to compute a hash over
it). However, that situation is not one we face. If we are making secure
references to the content of that location, we need to know what that
content is and hash it. Having hashed it, we have eliminated the possibility
of saying "whatever you might find at this location is what I'm saying
(tag ...) about".

If someone does solve the link problem, then I see those solutions finding
their way into the Ort cloud of related objects which accompanies a
certificate -- e.g., public keys, other certificates, xrls, .... I
don't see them entering the body of a certificate. We already have all the
links we need and they are sufficient for all our purposes.

It is an entirely different issue that a cryptographic hash, because it
implicitly solves the quoting problem and because it is fixed length, may be
a superior naming mechanism which the web itself might want to adopt.... I
don't propose to take the SPKI list into that discussion, however (except
that I just did :) .

- Carl

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBM1vDyFQXJENzYr45AQFrDgP6Ax32MU0yrRCPJJgSKbRG/9+J+r5pwe70
Q+5IyvKcZ8xpLXZp/Z8yjs+a4mCpZj87jcivGJ51ZQHFJfjZJv4X9LLdz2Lfh1MK
I4oKwj94fbeDL9bqNTElB5nI9W+/Qibx3se42UPb+omxZTxDx8cseLRpLKN2BLLV
4sSd50gvIrY=
=HS52
-----END PGP SIGNATURE-----

+------------------------------------------------------------------+
|Carl M. Ellison cme@cybercash.com http://www.clark.net/pub/cme |
|CyberCash, Inc. http://www.cybercash.com/ |
|207 Grindall Street PGP 2.6.2: 61E2DE7FCB9D7984E9C8048BA63221A2 |
|Baltimore MD 21230-4103 T:(410) 727-4288 F:(410)727-4293 |
+------------------------------------------------------------------+