Gems from Tim Bray's Annotated XML 1.0 Spec.

I Find Karma (adam@cs.caltech.edu)
Tue, 16 Jun 1998 07:10:30 -0700


Rohit mentioned in his "VML, PGML, and XML Marketing" post

http://xent.ics.uci.edu/FoRK-archive/jun98/0186.html

that I was writing about some nuggets on

http://www.xml.com/

as we speak. Actually, since April when I attended WWW7, I had been
meaning to visit Tim Bray's "Annotated XML 1.0" site at

http://www.xml.com/axml/testaxml.htm

and now that I have, I highly recommend for both entertainment and
education that the interested reader follow all of the annotations
Tim Bray has made. Not only is he insightful, but he is wonderfully
inciteful as well: this is a person who calls them as he sees them,
boldly and unapologetically. Truth be told, I find such candor to be
refreshing and fascinating.

Also, this sort of thing appeals to the "Metameta" fans among us
who smile a little whenever we consider something as self-referential
as writing the XML 1.0 spec using XML 1.0 ...

I'll include a few of my favorite annotations here. Note that these are
only a few of many, many annotations Tim Bray was kind enough to share
with us; I encourage you to go to the site above and click away for an
hour or two.

First off, Rohit, you're right: when there is an elegant design
available and the right person is in the right place at the right time,
then wonderful feats of design can happen. For example,

http://www.xml.com/axml/notes/Tim.html

> I had never really done any serious standards work, let alone written
> a specification, before this project. My main motivations in the XML
> project were:
>
> 1. To produce something that programmers could implement,
> 2. To make sure the internationalization was thorough and usable, and
> 3. To make sure that XML would provide good support for search and
> retrieval applications.

And then a little taste of politics, a field I'd desperately like to
learn more about:

http://www.xml.com/axml/notes/JeanPa.html

> Jean's contributions to the XML WG debate got careful attention not
> only because of who he represented, but because he knows what he's
> talking about. He never actually did any editorial work on the XML
> specification; the appearance of his name in the list of editors
> achieved two important political goals:
>
> 1. Helping ensure the acceptance of XML by getting Microsoft's name
> on the cover, and
> 2. Defusing a political brouhaha that blew up shortly after I, after
> having served as a WG member and co-editor on a pro bono basis for some
> 8 months, signed a consulting contract with Netscape whereby I acted as
> their eyes, ears, and voice on the XML WG. Microsoft, unable to tolerate
> their competitor being in an apparently favored position, demanded (and
> temporarily got) my dismissal as co-editor. Jean's appointment was part
> of the bargain that restored me to the co-editorship. In fairness to
> Jean, it should be pointed out that he never asked for the posting, and
> rumor has it that he actively resisted it, but it was such an obviously
> good idea that it got quick consensus support.

I never realized that they were within a single vote of XML being
called "MAGMA"...

http://www.xml.com/axml/notes/TheCorrectTitle.html

> The correct title of this specification, and the correct full name of
> XML, is "Extensible Markup Language". "eXtensible Markup Language" is
> just a spelling error. However, the abbreviation "XML" is not only
> correct but, appearing as it does in the title of the specification, an
> official name of the Extensible Markup Language.
>
> The name and abbreviation were invented by James Clark; other options
> under consideration had included MGML, for Minimal Generalized Markup
> Language. Here is an excerpt from an email from James dated August 19,
> 1996:
>
> > I agree that GM isn't vey catchy. The other problem with "generalized"
> > is that I suspect many, even quite technical people, don't know what a
> > generalized markup language is. Nonetheless it seems to me that the fact
> > that our markup language is generalized is something that should be
> > tremendously appealing to users: "it's the markup language where *you*,
> > not W3C or Netscape or Microsoft, choose how to mark up your data". I
> > think what we need is a word that gets across the idea of generalized
> > markup who don't know what it is. Perhaps something like "unrestricted",
> > "unlimited", "extensible", "user-controlled".
>
> I think putting "standard" in the name of the standard is a bit
> vacuous, so I would favour a name like UML or XML.
>
> And here's a reply from Jon Bosak, dated August 20th:
>
> > In my opinion, the U-combinations won't fly, but if we allow "X" to
> > stand for "extensible", then I could live with (and even come to love)
> > XML as an acronym for "extensible markup language", and I hereby now
> > throw it into the list of current proposals.
>
> And here, finally, are the results of the committee vote:
>
> 5 XML Extensible Markup Language
> 4 MAGMA Minimal Architecture for Generalized Markup Applications
> 3 SLIM Structured Language for Internet Markup
> 1 MGML Minimal Generalized Markup Language

And some "help help I'm being repressed" humor at

http://www.xml.com/axml/notes/Recommendation.html

> The World Wide Web Consortium (W3C) is not a democracy. This sentence
> means exactly what it says: that the Director, Tim Berners-Lee, having
> reviewed this specification as well as the votes and commentary
> submitted by W3C member organizations, decided to bless this document as
> a Recommendation.

For all of you wondering how to cite the XML 1.0 spec...

> A correct bibliographic reference, for use in paper publications,
> would be:
>
> "Extensible Markup Language (XML) 1.0", Tim Bray, Jean Paoli, and C.
> M. Sperberg-McQueen, 10 February 1998. Available at
> http://www.w3.org/TR/REC-xml

And I like the pointing out of the circular definition of URIs at

http://www.xml.com/axml/notes/URI.html

> A resource is a key concept in the architecture of the Web: any
> addressable unit of information of service. In practical terms, the
> definition (although useful) is somewhat circular: a resource is
> anything that can have a URI, and a URI is a short piece of text that
> identifies a resource.

Next comes a nifty little slice of philosophy about syntax, processors,
and APIs at

http://www.xml.com/axml/notes/DocsAndProcs.html

which indicate that the FoRK discussion we engaged in a month or two ago
about a small unifying event notification API

http://www.cs.caltech.edu/~adam/phd/generic-event-api.html

may be, as Rohit has suggested, trying to solve a problem that cannot be
solved: the complexity of interoperability across an Internet-scale
distributed system makes it intractable and impractical, as Tim Bray
pontificates:

> The specification does not define an Application Programming Interface
> which an application can use for interaction with an XML Processor. In
> fact, one of the specification's weak points is the discussion of
> exactly what information an application can expect to receive from an
> XML document.
>
> While this shortcoming is acknowledged, it has not caused me any great
> loss of sleep. I have seen immense amounts of work invested by very
> smart people in an attempt to create truly interoperable APIs; examples
> would include SQL, Posix, and the X Window System. Achieving real
> interoperability at the API level has proved so difficult in practice as
> to be open to question as a design goal. Furthermore, it is hard to be
> sure of coming up with an API that is of equal utility for all aspects
> of interacting with an XML document; the needs of an authoring system,
> of a browser, and of a full-text indexer are dramatically different.
>
> The real saving grace is the syntax. It is commonplace, even
> fashionable, to belittle the importance of syntax. But a document format
> which can unambiguously express complex hierarchical data structures,
> and which can reliably be parsed in a variety of computing environments,
> is in itself something of a rare and special achievement. If this is the
> only level of interoperability that XML ever achieves, it will still
> prove to have been a significant step forward in the history of
> distributed computing.

Jon Bosak sounds like the kind of leader we would want to drive any
given standards effort, with the right mix of skill, intelligence,
tenacity, and political prowess:

http://www.xml.com/axml/notes/Bosak.html

> Jon's stewardship of the XML process has been marked by a combination
> of deft political maneuvering with steadfast insistence on the
> principle of doing things based on principle, not expediency.
>
> One of the advantages of Jon's approach (going ahead even though there
> was little enthusiasm evident at the time in the W3C community) was that
> he was able to recruit members based on what he thought they had to
> offer, not what their employers wanted to get accomplished.

Actually, the SGML Working Group sounds like the "ideal world" kind of
group dynamic we would want to develop a standard, with everyone
watching everyone else's back:

http://www.xml.com/axml/notes/WG-SIG.html

> The important thing about the WG is how well the process worked. Its
> discussions and votes, which are a matter of public record, reveal that
> while the WG often failed to achieve unanimity, it did achieve consensus
> in the important meaning of the word, as evidenced by the fact that
> members of the WG are often willing to defend aspects of XML which they
> personally voted against.

And I've often wondered to myself -- as an outside person never having
been to an IETF or W3C working group meeting -- why WebDAV isn't an
activity area of the W3C. Well, the answer may reside in the history of
XML: Tim Bray suggests that resources are extremely scarce at the W3C in

http://www.xml.com/axml/notes/DanC.html

> The W3C was, paradoxically, something of a late arrival to the W3C
> party. While they authorized Jon Bosak to found and run the activity,
> providing that he made no call on W3C resources, the W3C staff did not
> perceive that XML had the potential for really high impact.
>
> The concrete effect of this was that the staff in general and Dan
> Connolly in particular essentially ignored the progress of XML until it
> suddenly started to gain wide industry acceptance in the spring of 1997.
> This had an upside in that the XML process was relatively untroubled by
> the kind of time-wasting industry politics and bureaucratic infighting
> that are inevitable in an organization such as W3C. It had a downside in
> that the XML process was deprived of Dan Connolly's considerable
> expertise and experience. In particular, after Dan became interested and
> involved, he made several suggestions which would, if implemented,
> probably have have constituted real improvements in this specification.
> Unfortunately, they would have required major document re-engineering,
> and the required time and editorial cycles were simply not present at
> that late stage of the process.
>
> Once XML became visible and public, Dan became an invaluable resource
> and deserves considerable credit for its eventual arrival at
> "Recommendation" status.

And, in an inspirational note we might want to take to heart regarding
the deployment of Rohit's *TP or W3C's HTTP-NG or IETF's HTTP-2.X or
(mimicking the name of XML) XTP or (adding some stuff to HTTP) HTTP++ or
(removing some stuff from HTTP) HTTP-- or (modularizing HTTP) HTTP-MOD
or (HTTP plus notifications, woo hoo) HTTP-NOT or whatever the heck
whatever comes next is gonna be called...

http://www.xml.com/axml/notes/Goal7.html

> Quick Design Process
>
> This goal was motivated largely by fear. We perceived that many of the
> Net's powers-that-be did not share our desire for widespread use of
> open, nonproprietary, textual data formats. We believed that if we
> didn't toss XML's hat into the ring soon, the Web's obvious need for
> extensibility would be met by some combination of binary gibberish and
> proprietary kludges.

Just because XML exists doesn't make the Web free from some combination
of proprietary gibberish and binary kludges, by the way.

Which brings up another design goal near and dear to my heart as an
academic:

http://www.xml.com/axml/notes/Goal8.html

> Formal and Concise Design
>
> A data format is programmer-friendly if programmers can read and use
> the defining documents; otherwise not. Too many other standards and
> specifications have relied too heavily on prose and not enough on
> formalisms.
>
> This is one area where some, in particular Dan Connolly, have argued
> that the actual XML spec falls short of the goal; that a version could
> have been created which was substantially more formal and concise than
> the current document.

And a nice little contradiction with which no one disagrees (cute
double negative) at

http://www.xml.com/axml/notes/OtherSpex.html

> This paragraph, which makes it clear that XML outsources some of its
> problems to other specifications (a good idea, and one with which no-one
> disagrees), is in fairly stark contrast to the claim in the abstract
> that XML is "completely described in this document". Oops.

Speaking of which, I just love his conversational tone. For example,
check out

http://www.xml.com/axml/notes/Draconian.html

> This innocent-looking definition embodies one of the most important
> and unprecedented aspects of XML: "Draconian" error-handling. Dracon
> (c.659-c.601 B.C.E.) introduced the first written legislation to Athens.
> His code was consistent in that it decreed the death penalty for crimes
> both low and high. Similarly, a conforming XML processor must "not
> continue normal processing" once it detects a fatal error. Phrases used
> to amplify this wording have included "halt and catch fire", "barf",
> "flush the document down the toilet", and "penalize innocent end-users".

And

http://www.xml.com/axml/notes/Mathemagics.html

> Some reviewers of the XML spec grumbled darkly about this
> un-called-for (they said) side trip into mathe-magic. Well, while one of
> your co-editors will confess to a math degree, we do bandy the terms
> "parent element" and "child element" around an awful lot, and it's good
> to have it written down somewhere, with great precision, exactly what
> that means. Anyhow, that's our story, and we're sticking to it.

And

http://www.xml.com/axml/notes/ProcComments.html

> It's important to note that processors are allowed, if they wish,
> simply to ignore the comments in a file. This means that if you're
> building an XML application, you should never rely on anything that
> shows up in a comment (this sleazy trick is far too common in HTML).

And

http://www.xml.com/axml/notes/BuddhaNature.html

> The Buddha-Nature of Element Types
>
> One can have an immensely amusing argument as to whether the Name that
> appears in start- and end-tags (and also empty-element tags, as the spec
> (tsk, tsk) doesn't say) is the type of the element, or whether the
> element's type is an abstract metaphysical what-not which is named by
> the type. This is reminiscent of the debate in Lewis Carroll about a
> song, its name, what it's called, what its name is called, and so on ad
> infinitum.
>
> I am pleased by the fact that the wording in the spec ("The name ...
> gives the element's type") can be used to support either interpretation.

And

http://www.xml.com/axml/notes/AttrsBoring.html

> Attribute Declarations and Proust
>
> I have taught half-day and whole-day courses in XML on many occasions.
> One of the big problems with the full-day course is that after you've
> done some history and introduction and goal-setting, then talked about
> elements, you get to this point (attribute declarations) around lunch
> time.
>
> The consequence of this is that when the class comes back after lunch,
> to listen the discussion of attribute types and their declarations, most
> of them go to sleep. The fact of the matter is that there are a lot of
> attribute types (I voted against a few of them), there are lots of
> relevant details, and it is pretty tedious. Bear in mind also that since
> all this stuff lives in the DTD, it is really of interest only in the
> context of validation, and if you're working with downstream
> non-validating applications, you can pretty well ignore all this
> attribute type stuff.

You know, one day I think it might be fun to visit James Clark

http://www.xml.com/axml/notes/JClark.html

> James Clark is an Englishman who has been lucky in life to the extent
> that he can turn his formidable talents to whatever most interests him.
> Fortunately for the worlds of publishing technology, what mosts interest
> him is software, typesetting, and structured documents. He is the author
> of groff, SP, and Jade; SP has come, in a practical way, to serve as the
> real-world definition of what SGML is and is not. He has already, in
> XML's short life, made some major technical contributions and, as usual,
> made them freely available for everyone.
>
> His position in the XML activity as "technical lead" meant, in
> practice, that whenever he said something, the rest of the group took it
> very seriously. His contributions to the design of XML included its
> name, the empty-element tag syntax, and many other crucial aspects.

And, what of the future of XML?

http://www.xml.com/axml/notes/FutureVersions.html

> Will there be future versions of XML? Maybe, maybe not. Toward the end
> of the development of XML 1.0, the XML WG tossed out a lot of pretty
> reasonable requests for improvements and enhancements on the grounds
> that we had to get this job done, and that they could be dealt with in
> version 1.1.
>
> On the other hand, immediately after the birth of XML 1.0, the parties
> involved had a massive attack of conservatism and fear. Since the
> industry acceptance of XML 1.0 had been astoundingly broad and fast, it
> seemed unreasonable to do any more fiddling with the spec. First of all,
> it would create confusion and uncertainty among those who were betting
> on this technology. Second, it seems foolish to charge ahead "making
> improvements" and "fixing problems", when in a year or so, we are going
> to have an immense amount of industry experience under our belts, and
> really know where the improvements need to be made and the problems need
> to be fixed.
>
> On the other hand, it is absolutely 100% certain that there will be
> other specifications that are layered on top of XML 1.0. There will
> definitely include specifications for namespaces, for hyperlinks, and
> for stylesheeting. It is also a pretty safe bet that some others will
> exist that we haven't yet begun to dream of.
>
> For the moment, it's safe to base implementations on XML 1.0, and
> highly unsound to put off developments waiting for some future version.

I finish up my "best of Tim Bray's annotations" post with his remarks
about semantics:

http://www.xml.com/axml/notes/NoElSemantics.html

> Markup Has No Semantics
>
> Writers discussing XML and its parent SGML have called upon a wide
> range of rhetoric and a menagerie of metaphors to try to explain what
> elements and attributes mean.
>
> Elements and attributes don't mean anything. All they do is break up
> documents into cleanly identified chunks, and give those chunks names.
> This is a useful and good thing to do, but trying to figure out what it
> all means is best reserved for barroom discussions at conferences (in
> which context it is an extremely worthwhile pursuit.)

I'm all for barroom discussions at conferences (in which context
ANYTHING is an extremely worthwhile pursuit).

Speaking of which, I'm really excited: I get to attend my first IETF
ever in Chicago in August! Rohit's been helping me this weekend to
practice staying up all night geeking out (my goodness, is it 7:12am
already?!)... Anyone out there have any tips for an IETF newbie? (And
don't say "bring vaseline" because that's NOT a very *helpful* tip... :)

----
adam@cs.caltech.edu

Yaron Goland: As sophisticated as I get is when I don't actually say anything.
Lisa Dusseault: How often do you have that kind of restraint?