Cool URIs & MIME Types

Dan Kohn (dan@teledesic.com)
Tue, 28 Dec 1999 09:49:46 -0800


Tim, I was extremely impressed with your essay
<http://www.w3.org/Provider/Style/URI> about architecting permanent URIs,
however I think you need to be clear that by removing the file extension,
you are implicitly suggesting segregating MIME types by directory. If you
think every URI should be it's own directory (not a completely crazy
suggestion), it would be worth making the suggestion explicit.

>File name extension. This is a very common one.
>"cgi", even ".html" is something which will change.
>You may not be using HTML for that page in 20 years
>time, but you might want today's links to it to still
>be valid. The canonical way of making links to the
>W3C site doesn't use the extension.

If you want to serve HTML, PDF, and PostScript from the same directory you
currently have to use file extensions. For example, although
<http://www.w3.org/TR/xhtml1> gives a 301 Moved Permanently to
<http://www.w3.org/TR/xhtml1/> which is then served as text/html, you still
need file extensions to serve <http://www.w3.org/TR/xhtml1/xhtml1.pdf> as
application/pdf and <http://www.w3.org/TR/xhtml1/xhtml1.tgz> as
application/gnutar.

What will happen in 20 years when you decide to rewrite
<http://www.w3.org/Provider/Style/Etiquette> as application/foobar while
wanting to leave <http://www.w3.org/Provider/Style/URI> as text/html?

One answer is that if every document is it's own directory, than it's easy
to assign the right MIME type with existing server software (this also
supports multiple MIME instantiations of the document in that directory).
This appears to be how the W3C is handling it's technical recommendations at
<http://www.w3.org/TR/>. The W3C is not applying one directory per document
everywhere though, as you have multiple HTML pages inside
<http://www.w3.org/Provider/Style/>.

Another answer would be new config files that allow MIME types to be
assigned document by document, although this would likely be difficult to
keep up-to-date. Another would be the ubiquitous use of META http-equiv
tags to have each document self-define it's MIME type without using file
extensions, although this clearly works better for some document types (HTML
and XML) than others (PDF and PNG).

However, in any case, I think removing file extensions is a little more
complicated than you're making out. Some additional guidance would be
appreciated. I pay a monthly fee to host <http://www.dankohn.com> through
Concentric, and the only control I have of MIME types is by using file
extensions. (Extensionless files are served as text/plain.) Although one
might argue that setting MIME types by directory is a feature that one
should get from a competitive hosting company, it won't happen unless we can
be clear on why it's needed.

BTW, according to
<http://www.isi.edu/in-notes/iana/assignments/media-types/application/>,
application/gnutar is not registered.

Separately, when you update the page next, it would be great to explicitly
mention the use of ISO 8601 dates <http://www.w3.org/TR/NOTE-datetime> if
you need to use a date in a URI. As summarized at
<http://www.saqqara.demon.co.uk/datefmt.htm>, these have real value in
sortability, language independence, Y2K, etc. It might also be worth noting
that splitting year, month, and day into separate directories may make it
easier to deal with huge archives spanning multiple disks.

Finally, note the irony that both Pathfinder URIs you mention in your piece
are now broken.

- dan

--
Daniel Kohn <mailto:dan@dankohn.com>
tel:+1-425-519-7968  fax:+1-425-602-6223
http://www.dankohn.com