Fwd: Re: A Plea for Schemas

Mark Baker (distobj@acm.org)
Mon, 01 Nov 1999 22:22:14 -0500

Wow. The most entertaining xml-dev post of all time, IMHO. Well worth the

It was in response to this, http://www.praxisxml.com/praxis_xml.html ,
another interesting diatribe on the need for schemas.


>Date: Mon, 01 Nov 1999 20:26:15 -0600
>From: Len Bullard <cbullard@hiwaay.net>
>Subject: Re: A Plea for Schemas
>Sender: owner-xml-dev@ic.ac.uk
>To: Matthew Gertner <matthew@praxis.cz>
>Cc: xml-dev@ic.ac.uk
>Reply-to: Len Bullard <cbullard@hiwaay.net>
>X-Mailer: Mozilla 3.04 (Win95; I)
>Matthew Gertner wrote:
>> I have written a short "XML Rant"
>Enjoyable. It is good to see some reasonable passion from a
>reasonable mind. Here is some rant for the rant.
>o "the 1980s, Charles Goldfarb invented SGML". Ok for a
>rant, but ISO created SGML. If any man can be said to
>have lead that work, it is Dr. Charles Goldfarb at IBM Almaden.
>He was a member of the IBM team (Goldfarb, Mosher, Lorie) that designed,
>To the idea of GenCodes, GML added among other things,
>type-defined namespaces for markup. GML and research were combined to
>propose and ratify ISO 8879. Invention like that is a community
>process. Dr. Goldfarb leads that community.
>In the late 1960s, publishers needed a means to
>exchange working files. A solution proposed at that time,
>GenCodes, was supported. The limited power of sharing
>the same single namespace (the Gencodes) did not evolve.
>The reasons are not complex and are the same as HTML:
>the namespace represents a local application context.
>When shared for all types, it limits the expressiveness
>needed to document multi-context real time events.
>o "..thousands loved it." Conceded. SGML was an expensive
>system deployed on then mostly mainframe and mini environments.
>Who had it? Aerospace tech writers, some artists, and lawyers.
>Why? They had a use for it and the costs were justified
>relative to the cost of the lifecycle of the information
>in its topical context. Manuals. Expensive ones.
>SGML lends itself to interpreted means and interpreted
>means are inefficient. That is relative to resources.
>As soon as SGML was moved to PC-based systems,
>it became cost-effective. There are and were examples of
>SGML-based systems working well for hypertext client
>applications in those environments. Except for
>lowlyIADS, mostly expensive ones. Systems like
>IADS proved SGML, if deFanged a bit, could be
>deployed cheaply. Free even.
>IADS did not use a DTD. It used a stylesheet (circa 1990).
>It had a DTD, and the tags within it were modifiable and
>extensible via the stylesheet processor. Its tags (file, frame,
>were the equivalent of the ThenMalignedAndDespised PROCESSING
>but they looked like tags, so DTDs written for the system
>incorporated them and went on about their business. Framing worked.
>In 1989:
>1. Software was expensive
>2. Hardware was expensive
>3. The dominant application of SGML (1000dpi print) was hard.
>SGML emerged into more general use when more power
>was on more desks. Complexity coupled to complexity
>produces emergence. TCO. The critical innovation
>to enable the emergence of SGML came from Intel, et al.
>The unification of a significantly sized software base by a dominant
>operating system company did the rest. Kick MS as much
>as people want to, without them, the Web today would
>still be something university students surfed and
>researchers occasionally mastered, IMNSHO.
>HTML emerged when:
>o The Internet was opened to commercial use
>o The power of the processor could support the
> lowest-common denominator application of SGML
>o Governments paid to implement and give away
> a means and process to share the namespace in that
> application
>o A person to lead the effort emerged with a plan
> that would work: Tim Berners-Lee, HTTP and HTML.
>These convergent events, all in the same five years, gave you the
>o HTML is a subset of SGML: NYET. Get out the ruler
>and rap the knuckles. XML is a subset of SGML. HTML
>is an *application* of SGML. It is obnoxious, and I
>apologize in advance, but getting others to understand
>**that** critical difference in thinking about markup is
>very hard sometimes. Where I put "application", some
>say, "vocabulary". Que bueno, but as Charles said,
>"conserve names" and that is all.
>Systems are invented or specified. Vocabularies are spoken.
>HTML was not hobbled. It was distilled like other vocabularies
>from agreements made among organizations to share information.
>CERN, Univ of Ill, DARPA agree to make such agreements and
>vocabularies are the result of that agreement. What the organizations
>share are namespaces and the implementations of processors for
>creating, adding, deleting, or modifying statements in those
>namespaces. HTML was GenCode: partDeux. TimBL gets the credit,
>but there were those who helped him and if you ask, I'm sure he
>will tell you names. Names are what is shared.
>It's all about names. Read the XML 1.0 and, IMHO, that
>is the conceptual breakthrough to understand markup. In essence,
>SGML has always been principally a lexical standard. That
>structural integrity is important, and specifying that
>provides the necessary freedom from implementation
>to enable an inexhaustible range of expression.
>It makes the agreement needed to implement a
>system to use it very expensive. XML locks
>down the SGML Declaration. Most of the biggest
>changes from SGML start there. To keep the original
>expressive power, the means for making beyondLex agreements
>are still needed.
>A DTD is not about lexical validation only. It
>is about validating a hierarchical namespace to
>determine conformance. Whether you use DTDs,
>MS Schemas, XML Schemas(someday), or just use
>the table design window for Access or Oracle,
>validating a vocabulary requires you to declare
>one or derive it. IMHO, of the two means, declaration
>is usually cheaper, but it is always political.
>Politics are human means to declare namespaces.
>BizTalk and OASIS both exist because of the names
>and interest of those named in the shared politics
>of creating their shared namespaces. That is all.
>XML does not care.
>Syntax unification is not enough. Using markup systems
>requires you to accept the idea that the namespace is
>primary. What does that mean? Just as sql systems
>must disambiguate aggregate naming, so must markup systems.
>A name means what you need it to. It must be unique and persistent
>to be a name and you require a means to discover if it is
>meeting that need. Trust but verify.
>Schemas are just one of the tools for discovering if
>that is the case. You can do more with schema information
>in the same way the relational system does it. Names
>are associated to create processable unique names.
>You can do a lot with the DTDs and schemas, really.
>They are just metainformation by which
>you agree to organize the screen and the objects on it,
>or the messages among objects, or whatever you want
>to talk about. The reason to use them
>is to validate or as a source for initialization. In
>effect, they really are, just another database of
>names and values. That is what makes using XML
>Schemas (in deference to DTDs), attractive. Application
>outside very specialize ISO 8879-conforming processors
>for DTDs are also useful for managing the namespace
>of that metainformation.
>DTDs do not aggregate; so, if instances do, they
>are not validatible. That does not keep them from
>being useful. The names in the space are unique.
>Their persistence is questionable, yet if you treat
>them as a relational designer treats a view, they
>are very useful. Well-formed is what you need for
>any lifecycle of the information. Valid is what
>you need to ensure correct processes among systems
>that use the information at particular times. When
>a formal means to persist these better is provided,
>then we have a very good system for maintaining
>namespace communities.
>Schemas organize a namespace; not doing that is
>relaxing a design constraint on the namespace. Relaxing
>that constraint is efficient particularly at this
>time when database systems are so cheap and ubiquitous,
>using them for serving strings is optimal. Correct-
>by-construction from a trusted source is faster,
>more compact, and less-restricting on system evolution.
>Badly-formed HTML? It was a trade-off. It cleans
>up over time. Better tools, better hunts, better times.
>All XML says is, you don't have to use the DTD.
>It doesn't say it isn't useful. Enlightened XMLers
>write them and use them and even throw them away.
>A DTD is snapshot of the organization of a namespace
>in time. Time moves on. Information does too.
>The DTD might not. Some part of it probably
>will and will influence the next version. The
>reason to use or not use a DTD or any other
>schema is determined by the namespace evolution:
>and evolution of agreements, so cooperation.
>Cooperation among large human communities is
>always furthered when agreements about what
>to name the names are simple and easy to verify.
>When the means to communicate among companies
>became the Web, the need to verify these agreements
>by simple means became an ecological imperative.
>So, patience. But don't quit pleading. Namespaces
>are gardens. To grow usefully, they have to be tended.
>It takes tools, lots of them, for particular
>purposes, to do that. Most of us have sheds full of
>tools we only use occasionally next to ones we use
>every day.
>That golden 10% of XML is the distilled essence of
>SGML and the years of practice and competing, sometimes
>awkward specifications and standards written there
>by all of the people I met in those years. Even
>those HyTime guys worked on creating XML. HyTime,
>DSSSL, TEI, but before them, Dexter, FRESS, Englebart,
>all feed the single stream that is now XML and as
>with SGML, all the competing, sometimes awkward
>specifications being written by many of the same people.
>If you want to plead for schemas, I plead with you. Schemas are a
>tool for validating agreements among overlapping namespace
>communities. Ecom-ecologies (keiretsu) emerge because
>the tools they use to make agreements, their namespaces,
>become efficient. S=KlogW - Boltzman. To control
>the temperature, control the value of W. DTDs help
>you control the rate at which entropy consumes referents.
>The trick to fix the web is to fix the web's indexes.
>To do that, ensure the agreements by which the indexes
>are made enable validation of the namespaces indexed.
>Well-formed, and valid by agreement are the keys to creating
>semantic space, overlapping vocabularies, if that is what
>you want.
>DTDs are a tool to make agreements. Beyond the agreement are the
>names that agree. XML Doesn't Care. You do. You write:
> Dilution of the basic principles of generic markup, and
> misunderstanding of their purpose, will then give rise to
> disappointment, and hence rejection: "We switched our whole
> company over to XML and we still can't interchange data
> So this means that XML doesn't work, right?"
>How many 'MLers here want a dollar for every time you've heard that?
>Tell 'em, "ahh, XML Works. We just don't agree on how."
>len bullard
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ and on
CD-ROM/ISBN 981-02-3594-1
>To unsubscribe, mailto:majordomo@ic.ac.uk the following message;
>unsubscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)