Re: <URL: yada yada

Dan Connolly (connolly@w3.org)
Thu, 26 Jun 1997 04:35:48 -0500


Sigh... why did you have to remind me of all the brokenness
in the world, Rohit?

I happen to be up late on account of jet-lag from Tokyo, so I
really have nothing better to do than join this thread.

First, some errors in fact:

Rohit Khare wrote:
>
> > in the newsletter RFC-1738 is my story and I'm stickin' to it. See
> > the notes to <http://www.tbtf.com/archive/03-01-97.html>.
>
> Keith is referring to the obsolescent December 1994 URL RFC (not stds-track)
> which quoth as below.

RFC-1738 is standards track (as is 1808):

==================
http://ds.internic.net/rfc/rfc2200.txt
Network Working Group Internet Architecture Board
Request for Comments: 2200 J. Postel, Editor
Obsoletes: 2000, 1920, 1880, 1800, 1780, June 1997
1720, 1610, 1600, 1540, 1500, 1410, 1360,
1280, 1250, 1200, 1140, 1130, 1100, 1083
STD: 1
Category: Standards Track

INTERNET OFFICIAL PROTOCOL STANDARDS

6.5. Proposed Standard Protocols

Protocol Name Status RFC
======== ===================================== ============== =====
URL Uniform Resource Locators Elective 1738
URL Relative Uniform Resource Locators Elective 1808
==================

Now on to matters of opinion and style:

> Roy Fielding did the community a service with his definitive rewrite of a
> correct URL grammar in RFC 1808, which IS standards track, but introduced
> the ugliness we are now fighting :-)

Yes, Roy did a service. Note that the service was not so much
to invent as to act as scribe for the URI WG. Though the careful
re-write of the details of the URI grammar and parsing rules
is largely Roy's contribution alone.

He certainly shouldn't be blamed for the <URL...> nonsense.

Dan@teledesic:

>MS Outlook isn't smart enough to recognize URLs that begin URL:http://
>so Keith's usage annoys me, whether it's correct or not.

Remember that URL parsing in free text is an art, and not a science.

In my experience and opinion, the best way to relaibly write URLs
in free text is to (1) make sure they DON'T have any internal
whitespace, not even linebreaks, and (2) separate them by
lots of whitespace. e.g.:

http://www.w3.org/

The conventions in RFC1808 were dreamed up by some URI WG folks
who thought that software could undo linebreaks and such, recovering
the above URL from stuff like:

<URL:http://www.
w3.org>

For that reason, URL: is a somewhat reasonable thing to do because
it increases redundancy, which increases the ability to do
heuristic stuff. But I've never seen these heuristics implemented
anyway, so why bother? Besides:

Rohit:
>So besides
>being ugly and clogging UR* parsers, it encourages a false dichotomy.

Yes, anybody who really thinks that the URI namespace has subdivisions
that need labels (other than labels like http: and ftp:) needs
all the cluons, demerits, head-whacking, etc. that we can dish out.

While the MS outlook parser may lack in its ability to do
heuristic parsing, that pales in comparison to Netscape's
inability to parse perfectly valid URLs like irc://server/channel.

For more tales of woe, see:

===========
Notes on URI Implementations
http://www.w3.org/Addressing/software

Notes on implementations and how they're described and
specified -- i.e. shared context in the developer community.
Also bugs.
===========

Roy has cataloged such tales of woe at:

http://www.ics.uci.edu/~fielding/url/

Now about that archive format: If the text-to-html filter is
smart enough to recognize the [N] idiom (originally from
www -listrefs, but I believe popularized by yours truly)
and make links out of the [N] gizmos, why doesn't it
elide the actual URLs at the bottom of the message?

The SGML cop hereby cites you in violation of
the "Avoid talking about mechanics" principle of
hypertext style[1].

[1] http://www.w3.org/Provider/Style/NoMechanics

(oh for handy tools to make digitally signed PICS lables!)

More examples:

It's OK to write:

==========
the lips will reside at
<http://www.tbtf.com/resource/the-lips.html>
==========

but (IMHO -- gotta get this one written up) it's bad style to write:

==========
See <http://www.matterform.com/>
==========

Write in stead:

==========
See "Email Database & Web Site Management from Matterform
Media" at <http://www.matterform.com/>.
==========

Now those folks confused the TITLE with the ADDRESS. If they
had moved the author/date info out of TITLE and into ADDRESS
(per the "Sign it!" principle at
http://www.w3.org/Provider/Style/SignIt) then you could write:

==========
See "Email Database & Web Site Management," 1996 by
Matterform Media at <http://www.matterform.com/>.
==========

and you could probably get the machine to make citations of
that form.

More musings on the sad state of the art:

This message was EXTREMELY painful to compose: The bane of
my existence is doing things that I know the computer could
do for me. I should have a decent hypertext email composition
tool on my desktop, and I should be able to quote and excerpt
with full audit trail with a few simple UI gestures.

This sad state of implementations w.r.t. the potential represented
by the web architecture subjects the architecutre itself to attack.

Ted Nelson's whole new book[Nel97] is full of criticism of
web architecture based on experience with the current implementations.
Blech. I gotta get my IOH paper[3] updated and published.

[Nel97] "The Future of Information: Ideas, Connections,
and the Gods of Electronic Literature" June 1997, Theodor Holm Nelson

[3] http://www.w3.org/Architecture/NOTE-ioh-arch
$Date: 1997/05/26 20:19:35 $

-- 
Dan Connolly, W3C Architecture Domain Lead
http://www.w3.org/People/Connolly/