The Metadata Saga

Rohit Khare (khare@w3.org)
Wed, 30 Apr 1997 01:34:44 -0400 (EDT)


[I prepared this in response to a nameless questioner from the Net]
[JimW, RS: any comments?]

> The MetaData project?
>
> The idea of having more content in HTML pages. So they can be read by
> programs, rather than just people.

Here's the scoop: several different metadata initiatives are colliding
messily in real-time. This is the kind of godawful mess of convergent
evolution that the W3C *just might* be the right answer to.

Several sources:

1) PICS (of course). New twists are that I finally got through to them
that URLs are not enough for secure pointers, so they had to figure out
how to differentiate different version, media-types, languages, etc that
could be behind one location. Led to...

2) PICS-NG. collided with a separate intra-PICS movement to have more
structured rating values (strings, structs, pathnames, set
inclusion/exclusion, etc). Ora Lassila from Nokia is working on that
draft at W3C.

3) WebDAV. Jim Whitehead's team took a detour (IMHO) into storing and
manipulating 'small' metadata chunks with versioned documents. Immediately
ran afoul of PEP, HTTP purists (what's with GETMETA?), and heavier
schemes like...

4) Dublin Core, et al. Actual, concrete metadata schemas were banging on
our door for acceptance too. Who defines "author", "publisher", etc?
The usual digital library community suspects.

5) SiteMap. Microsoft originally proposed a stylized use of HTML to
outline a site, for use in collapsible 'remote controls' and printing.
Used nested ULs to indicate hierarchy, etc -- too much tacit knowledge.

6) Digital signature manifests. It becomes evident almost instantly that
one needs to sign packages, not atomic blobs, so we needed a DSIG Common
Manifest Format for enumerating bills of materials.

7) Email to HTML. Qualcomm wanted to use HTML as the native UI format
for mail, but need a way to structurally markup quoted regions, etc.
Drove an <ABOUT> tag proposal Dave Raggett made earlier, in order to
associate metadata about one quotee in several quotes.

8) XML. Of course, at the same time as wars are being fought between
()s (PICS) and {}s (PEP), <> has emerged as the industry standard for
'open dust' (e.g. Open Financial Exchange, most amusingly, HDML, most
corrosively). So SiteMaps morphed into XML-syntax-based proposals. Hence
the CDF submission, metadata about push channels rendered in XML.

The whole schmear has fallen in Ralph Swick's lap at W3C. He owns the
helm on coordinating strategy on these issues. I identify this crucible
as an argument *for* W3C, because it may be that only because we had
staff in all these areas, and because we have real technical people,
not project managers, we may be able to restabilize this whole knot of
problems with a dose of 'true gospel' as dispensed by Dan Connolly and
Tim Berners-Lee.