Draft for Review: Namespaces II, postmodernism

Rohit Khare (rohit@uci.edu)
Mon, 25 Oct 1999 00:39:17 -0700


In a small miracle, I actually got up at 8AM this morning, stayed=20
awake, sat down this afternoon, and in 341 minutes, at an average of=20
11 words per minute, just wrote my column in one go. This is=20
extremely unusual, and thus pleasing. However, I think the column may=20
be a bit more... florid... than may be called for. I would=20
appreciate any and all comments, public or private.

Rohit Khare

PS. I'm sorry it doesn't have the formatting down, but I think y'all=20
are familiar enough with my ranting to correct for that :-)

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Seventh Heaven

What's in a Name? Trust.

Internet-Scale Namespaces, Part II
Rohit Khare * 4K Associates * October 25, 1999

[@@ Ed: I tried my best with the voice of I/We; feel free to conform with ho=
use
stylebook. Also, this draft doesn't actually use the words 'six degrees of
separation', much to my surprise. ]

A renowned programming pioneer quipped "Any problem in computer science can =
be
solved with another layer of indirection." In fact, I just did! Indirect, th=
at
is - not that I authored that aphorism. I deferred the problem of actually
looking up the speaker by binding it to the local symbol "a renowned=20
programming
pioneer."

Now that you've come through the looking-glass which is my column, you're ju=
st
going to have to trust me. Like Humpty-Dumpty, "When I use a word, it=20
means just
what I choose it to mean --- neither more nor less." So when you=20
challenge me to
resolve that reference into an actual person, you'll have to trust my=20
sources in
turn.

Would you trust the score of Web pages I can search up which simply cite it =
as
Anon., lost to the mists of ancient computing history? Or an MIT professor w=
ho
pointed me to the legendary wit of Alan Perlis, founder of Carnegie Mellon's
computer science program? Or, as I long assumed, the professor who taught it=
to
me in the first place, Butler Lampson?

In fact, my "resolution function" for this namespace was a PDF file=20
at Microsoft
Research's web site. Prof. Lampson's own Turing Lecture slides correctly
attributed it to David Wheeler, chief programmer for the EDSAC project in th=
e
early '50s.

Resolving "renowned programming pioneer" to "David Wheeler" seems nothing li=
ke
resolving w3.org to 18.29.0.27; the former process weighs human relationship=
s,
history, and judgment, while the latter mechanically queries a Domain Name
System (DNS) database. Dig deeper, though, I believe they're seem more simil=
ar
than not. Every decision to name something is a trust decision, resolvable o=
nly
in the context of some community that agrees to that namespace.

Why We Name

Perhaps it makes more sense in reverse: why do we name objects in the first
place? We use names to abstract away details: of location, of authorization,=
of
human-readability. Every namespace interposes a new fulcrum for administrati=
ve
leverage: to redirect the binding, extract rents, and implement other social
policies.

In the last issue's dissection of the Anatomy of a URL, I made the=20
further claim
that namespaces can be unwrapped in layers, with each layer's address becomi=
ng
the next lower layer's name. With explicit reference to Ray & Charles=20
Eames' fim
Powers of Ten, we zoomed in from the visible surface of a Web browser to dom=
ain
names, IP addresses, Ethernet MAC station IDs, modem numbers, and so on.

To help classify namespaces, I tried collect some figures of merit on each: =
the
number of entries, the density of possible entries, their lifetime,=20
the lifetime
of the binding, its organizational authority, user presentation, and so on.
Resolution was characterized as a function from the domain of names=20
to the range
of addresses, allowing us to note injectivity (a one-to-one mapping),
surjectivity (that every address has a name), computability, and invertibili=
ty.

All of these mathematical properties have political consequences. Injectivit=
y,
for example, creates scarcity, since "united.com" can only point to the airl=
ine
or the van line. Economics govern the allocation of scarce resources, leadin=
g
directly to the politics surrounding domain name system reform. Less marketa=
ble
identifiers such as Ethernet IDs are simply sold in bulk.

Surjectivity is another political problem: can any government compel every
citizen to use a unique Social Security Number? Is every citizen addressable
through the postal service? (@@a US Federal judge served the first e-mail
subpoena this year, on an overseas defendant). There are related privacy fea=
rs:
we assume unlisted phone numbers will remain uncomputable by name; and that =
a
phone number won't be invertible to a person.

Mobility is an example of an overall property that depends on several featur=
es.
Cellular phone roaming, for example, requires very low latency updates, whil=
e
assigning or transferring domain names can take up to three days to propagat=
e
across the Internet - but soon, dialup users will expect to acquire a Dynami=
c
DNS name in seconds.

Reaching in the opposite direction, we can ease resolution at the expense of
mobility by weakening our names to function as locators. Consider a=20
hypothetical
URN (Uniform Resource Name) such as RFC:2616 and its transformation into
http://info.internet.isi.edu/in-notes/rfc/files/rfc2616.txt, an=20
explicit path to
one of the thousands of copies of the HTTP specification on the Internet. IP
addresses strike a similar compromise between the topologically consistent
network prefix, and the host-specific suffix.

The example of IP, in turn, introduces an archetypal feature supporting
Internet-scale: explicit delegation and reservation within a=20
namespace. Both the
political structure and protocol design of DNS parse www.united.com as a
hierarchical cascade of authority from ICANN (the Internet Corporation for
Assigned Names and Numbers) to several .com registrars, to the legal owner o=
f
united.com and its sysadmins. Other parts of a namespace might be explicitly
excluded from such authority, such as IP network prefix 10 for private use
disconnected from the public Internet, or X- experimental message headers.

Just as the original film Powers of Ten surveyed everything from cosmology t=
o
quantum mechanics, while underscoring that the same physical rules applied
everywhere, the point of our tour is to identify more of these scale-invaria=
nt
design rules for Internet-scale namespaces.

Zooming in: the HTTP transaction

We resume our journey as browser uses its carefully constructed connection t=
o
the Web server to send an actual HTTP transaction (See Listing 1). There are
several more namespaces at work here, such as the Method and Version=20
number I've
boldfaced in the request. Only IETF RFCs can formally define new methods and
HTTP revisions.

GET /PICS/DSig/Overview HTTP/1.1
Host: www.w3.org

HTTP/1.1 200 OK
Date: Wed, 18 Aug 1999 21:22:41 GMT
Server: Apache/1.3.6 (Unix) PHP/3.0.11
Content-Location: Overview.html
Vary: negotiate
Last-Modified: Mon, 06 Apr 1998 20:24:44 GMT
ETag: "2def30-a2e-35293a0c;35293a2f"
Accept-Ranges: bytes
Content-Length: 2606
Content-Language: en-us
Content-Type: text/html; charset=3Diso-8859-1

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
=8A<META http-equiv=3D"PICS-Label" content=3D'(PICS-1.1 "http://www.gcf.org/=
v2.5"
by "John Doe" labels for "http://www.w3.org/PICS/DSig/Overview"
extension (optional "http://www.w3.org/TR/1998/REC-DSig-label/resinfo-1_0"
("http://www.w3.org/TR/1998/REC-DSig-label/MD5-1_0" "cdc43463463=3D"
"1997-02-05T08:15-0500"))
extension (optional "http://www.w3.org/TR/1998/REC-DSig-label/sigblock-1_0"
("AttribInfo" ("http://www.w3.org/PICS/DSig/X509-1_0" "efe64685685=3D")
("http://www.w3.org/PICS/DSig/X509-1_0"
"http://SomeCA/Certs/ByDN/CN=3DPeterLipp,O=3DTU-Graz,OU=3DIAIK")
("http://www.w3.org/PICS/DSig/pgpcert-1_0" "ghg86807807=3D")
("http://www.w3.org/PICS/DSig/pgpcert-1_0"
"http://pgp.com/certstore/plipp@iaik.tu-graz.ac.at"))
("Signature" "http://www.w3.org/TR/1998/REC-DSig-label/RSA-MD5-1_0"
("byKey" (("N" "aba212412412=3D") ("E" "3jdg93fj")))
("on" "1996-12-02T22:20-0000") ("SigCrypto" "3j9fsaJ30SD=3D")))
on "1994.11.05T08:15-0500"
ratings (suds 0.5 density 0 color 1))'>

Listing 1. An HTTP transaction, including an HTML response body with=20
an embedded
PICS rating and digital signature within a <META> tag.

The response body cites some more complex namespaces. Since interoperability
requires, at minimum, effective error notification, it's easier to merely
register a new reply code with IANA than to negotiate a full standards-track
RFC. The product token is a completely malleable, private name which is
nonetheless structured by a slash and useful for logging purposes
('technographic' data, as the jargon goes).

To expedite future cache validation, it's useful to have an absolutely uniqu=
e
identifier for the entity (payload) that's included here. The entity-tag is =
an
opaque string in a namespace maintained exclusively by the server, which mus=
t
only guarantee its uniqueness. If the Etag hasn't changed, it's still a fres=
h
copy.

The entity itself is also described: its length, its last-modified date, and=
so
on. It also has a content-type, selected from the set of IANA-registered MIM=
E
media types, a two-level hierarchy divided into broad capabilities=20
(image, text,
application, etc). The character set is specified by a text string=20
registered by
IANA, but ultimately defined by ISO, as is the content-language, a combinati=
on
of ISO-639 language abbreviations and optional ISO-3166 country codes.

Zooming in: Digital Signatures

At our maximum magnification, we are finally inspecting the very object of o=
ur
desires, namespaces within the actual HTML document. There's an SGML-mandate=
d
prologue defining the particular document type, both by an=20
ISO-registered Formal
Public Identifier (FPI) and a URL. Within the namespace of HTML tags,=20
META has a
hybrid ability to specify an HTTP header. We find a parenthesis-delimited
s-expression in its attributes, parseable only within its own Platform for
Internet Content Selection (PICS) syntax.

And indeed, the first thing the PICS label declares is that it delegates its
ultimate meaning to the 'Good Clean Fun' ratings scheme; that's the=20
organization
that sets the metrics for the suds/density/color rating vector in the final
line. In between, we see several extension blocks that represent a digital
signature of the label itself.

=46irst, there is an algorithm identifier for the hash function used to vouc=
hsafe
the particular document text John Doe will rate. Then it specifies the signi=
ng
keys, and finally the actual signature algorithm and cryptographic result.

Zooming out: Human Identity

Ultimately, though, we need to link those prime numbers back to an actual,
legal, human. And so the whole picture falls apart=8A Now we zoom out,=
beyond the
Internet connection, beyond the browser, beyond even the PC, to take in whol=
e
organizations and nations. We have fallen off the edge of technology and int=
o
society.

We need larger-scale namespaces to identify the signing principal. The lifet=
ime
of this name is much longer than an individual Web transaction. The=20
social scope
of this name should identify him or her to a wider community than just the
immediate counterparties. And that name is typically used across multiple
applications, for multiple purposes.

Any resolution function over a domain of humans and incorporated organizatio=
ns
raises critical non-technical questions. Privacy, for one: will the function=
be
known to all, prohibiting anonymity? Or conversely, could it be so weak that=
it
is overwhelmed by ephemeral pseudonyms? Will the binding be legally=20
trustworthy,
valid enough to strike contracts? And as we mentioned before, mightn't
injectivity create a scarcity, in the rush to claim "Joe Doaks"=20
rather than "Joe
Doaks, the short, fat one who lives in a van down by the river"? Or raise th=
e
totalitarian scepter of surjection, compelling universal IDs and absolute
traceability? What about billion-citizen nations where enumeration=20
seems utterly
impossible?

X.500: Trust Your Superiors

The engineer's first resort, then, is to divide and conquer, to hide behind
hierarchy. That's the rationale behind the dominant international standard, =
the
X.500 directory schema and X.509 signed certificate. In Listing 1,=20
the first key
identifier uses X.500, qualifying the Common Name (CN) "PeterLipp" by his
Organization (O), the Technical University of Graz, Austria and Organization=
al
Unit (OU) within it. There are a few other attributes in the standard schema=
:
Country (C), Locality/Region (L), State/Province (ST), and Address (STREET).
Taken together, the whole record is known as a Distinguished Name (DN).

We can have faith in such DNs because each component within it can be a
Certification Authority. That is, the same hierarchical structure=20
used to narrow
down the particular individual we have in mind requires a pyramid of trusted
delegations going back up - a top-heavy structure indeed!

If I were to have a certificate issued to "cn=3DRohit Khare, ou=3DInformatio=
n and
Computer Science, l=3DIrvine o=3DUniversity of California, st=3DCA, c=3DUS",=
I'd end up
with the statement "Rohit's key is 37", co-signed by me, my=20
department chair, my
chancellor, the president of the UC system, the Governor, and the President =
(or
their delegees). And if anyone of those were missing - say, the=20
chancellor's - I
wouldn't be able to go sailing. There'd be no least common ancestor between =
the
athletic department and ICS, and I'd be booted out like the rest of=20
the unwashed
masses.

When it does work, though, it's beautiful engineering. I can walk into the U=
CLA
library and authenticate myself against our common faith in the UC system, o=
r
even with other California corporations. Of course, there's the little detai=
l
that the upper reaches of this system must be commonly trusted by millions -=
if
not billions - of people to work at Internet-scale. Who, after all, can
authenticate US citizens abroad? The UN? Or thousands of pairwise national
cross-certifications?

PGP: Trust No One

Or, for that matter, identities that just don't happen to neatly fit into th=
is
global political hierarchy? The FoRK mailing list is a slippery, transnation=
al
community that still needs to authenticate its members to each other. What I=
'd
use here is Pretty Good Privacy (PGP) for its implementation of a=20
'Web of Trust'
instead.

I'd begin with a fresh keypair, self-signed by the string FoRK@XeNT.com. It =
is
born without meaning, without value of its own. But I like it, and I will al=
so
sign it, as Rohit@4K-Associates.com. I'll call up my friend Ron in Brazil an=
d
read him off a few digits of the new FoRK key, and he'll sign it too.=20
Now anyone
who wants to join the club who happens to know me and trust me - or perhaps
knows both me and Ron and trusts us both a little bit - has reason to=20
believe in
the FoRK key's security. Over time, the whole community looks like a crazy m=
esh
constructed by 'six degrees of separation' rather than any central=20
=46oRK passport
office, and perhaps that's how it should be.

Or perhaps not. When I try to go sailing again, and I happen to know the
attendant's brother, should he let me in? He may certainly believe I'm
Rohit@4K-Associates.com, but that doesn't prove I'm a student at UC Irvine a=
nd
to be entrusted with a boat. It's critical to know why you trust an assertio=
n.
Pretty much the only thing PGP is guaranteed useful for is verifying email
addresses. [If you'd like to read much more on the philosophy of trust
management, consult Weaving a Web of Trust at
http://www.4K-Associates.com/Library/trust]

And finally, it's not clear that this model achieves Internet-Scale either. =
In
real life, I may know hundreds of people and PGP will suffice to secure my
personal communications. But the whole beauty of the Internet - indeed, of
public-key cryptography broadly - is the ease of spontaneous=20
communication. When
I get e-mail from a stranger, I need to go rummage through my keyring to
construct some trusted pathway from my friends to this fellow. Ironically, f=
or
such a decentralized trust calculus, the PGP community depends on Brian
LaMacchia's absolutely centralized Keyserver. The critical difference, thoug=
h,
is that unlike an X.509 CA, nobody has to trust Dr. LaMacchia; it's=20
just a cache
of signed keys, take it or leave it, some bogus or not.

In this sort of ultimately existential universe, names truly are relative. A=
s
far as you know, dear reader, there isn't any Dr. LaMacchia at all; it's jus=
t
"Rohit's Brian" until proven otherwise. Self-centered naming is also the key
insight of Ron Rivest and Prof. Lampson's Simple Distributed Security
Infrastructure (SDSI) proposal, and inspires the charter of the IETF Simple
Public Key Infrastructure (SPKI) working group (there's also a more mature
Public Key Infrastructure for X.509 (PKIX) working group).

Ontology in Angle Brackets

If such semiotic confusion attends merely finding each other, imagine actual=
ly
saying anything! In our original example, I was firing up my browser=20
to purchase
an airplane ticket. The promise of XML is that I should be able to=20
automatically
extract minutia like the airfare from my pretty pages and plop it directly i=
nto
my expense report.

<B> Total: <FARE currency=3D'usd' basis=3D'R'>$6010</FARE></B>

In order to add this new tag to my vocabulary, I need to look up United's
definition. The central tenet of the XML Namespaces facility is that a tag n=
ame
can really be a URL.

<HEAD xmlns:u=3D'http://united.com/schemas/fares'>=8A
<u:FARE u:currency=3D'usd' u:basis=3D'R'> $6010 </FARE>

That's all well and good if 4K Associates has a private agreement with Unite=
d
for its expense reporting (and happens to win the lottery to be buying R-cla=
ss
supersonic seats :-). How can we compare airlines' fares in our expense
reporting application?

The nifty Internet-scale solution is that the URIs can directly reflect the
scope of the community sharing that ontology. Over time, if other=20
airlines adopt
the same tag, the namespace prefix could migrate to iata.int/fareschema - sa=
me
tag, wider ratification. The same approach scales down, too, to indicate ver=
y
private or very experimental features.

Of course, there will be fundamental ontological mismatches. The=20
airline, hotel,
and car rental industry <DAY> tags are inherently incommensurable. XML
namespaces also accurately flag such conflicts.

XML and RDF-driven metadata technology will lead to a sort of 'Cambrian
explosion' of new Internet-scale namespaces, especially as real-world
categorizations move online. Table 1 is just a selection of the kinds of
namespaces that may get extended onto the Internet. Imagine what it would be
like to plug in a new laser printer, and find that the list of printer profi=
les
wasn't tied to a directory listing of the 100 most popular files your OS ven=
dor
shipped - some obscure directory you're going to have to manually put the PP=
D
file you searched far and wide for - and instead was tied to a namespace
maintained by Adobe, always up to date. Or what if, instead of maintaining
separate lists of usernames for UNIX logins, NT logins, Web server logins, a=
nd
so on, you could use a single namespace you trusted yourself? Not=20
just directory
server vendor hype, but to genuinely merge the meaning of those identities? =
The
UN-sponsored Electronic Data Interchange (EDIFACT) standards rely on a
little-known universal serial number for every corporation on the planet, a
registry run by Dun & Bradstreet, Inc. I don't think that's really=20
responsive to
Internet-Scale demand.

Dublin Core Library of Congress classifications Yahoo! Categories

ISBN / ISSN numbers

http://isbn.nu/<isbn> - try it! UPC product bar codes GPS coordinates (?)

RFCs & Internet-Drafts User & Group profiles Printer Descriptions (PPDs)

Video Codecs Fonts Colorspaces

Java class files Hashes & GUIDs (globally unique IDs) Social Security
Numbers

DUNS business ID number XML elements MIME Media Types

Table 1: A diverse sampling of Internet-scale namespaces, above and beyond t=
he
common Domain Name.

Recurring Internet-Scale Issues

Namespace management at Internet scale requires more than scalable lookup
algorithms alone. Internet scale is additionally about scaling across time,
space, and organizations -- raising unique issues of longevity, latency, and
liability, respectively.

We'll need names to get a handle on phenomena of widely varying lifetimes, f=
rom
a name for the digital camera you just placed within infrared range of your
laptop, to millennia-old lyric poems. Even within the human lifetime of majo=
r
software systems, we'll need to maintain machine and human readability. If y=
ou
think IP address space is in crisis, the international air traffic transpond=
er
standard is only now migrating from Mode C 12-bit flight IDs to digital Mode=
S
24-bit permanent airframe IDs. Just imagine how much complexity in the curre=
nt
ATC system is sheer resynchronization of per-region flight IDs upon=20
handoff. All
general-aviation traffic in the US, for example, is forced onto a single cod=
e,
1200.

This shift from hours-long flight numbering to decades-long airframe numberi=
ng
illustrates the sorts of reengineering more mundane internal applications wi=
ll
have to adapt to as these systems are woven together with your business
partners' across the Internet. Something as simple as an employee-number ID
field can be blown away when redeployed into a joint-venture subsidiary.
Human-readable identifiers are one way to gracefully re-integrate such
ontological mismatches.

By the time you read this, Y2K will be right around the corner. Take the lon=
g
view and think about how your application might evolve over the years. How m=
any
of those character-string part-names and color fields might be=20
replaced by URIs?
That's a big dollop of indirection, allowing later users wide latitude to mo=
ve,
merge, replace, enlarge, translate, visualize, or replace that parameter
namespace.

At the same time, this move illustrates the importance of building upon firm
Internet standards. Longevity requires security and reliability of the
resolution function itself, not just the naming policy. Mobility and=20
agility, on
the other hand, emphasize celerity.

Second, relying on names across space requires explicitly coping with latenc=
y,
nomadic connectivity, and geography. Nameservices requiring online resolutio=
n
and instantaneous updating will have to gracefully distribute between machin=
es
separated by not just 30ms on the LAN or 300ms across the Internet, but by d=
ays
or weeks of disconnected use by nomadic users. How will your system trace th=
e
consequences of an expired digital certificate? How can we resolve "Rohit's
Brian" in physically decentralized form? What about a network where united.c=
om
will mean different businesses depending on where I am? Why not local=20
resolution
to local ticket offices? Conversely, why the heck should a few inconsequenti=
al
miles of private highway outside UCI hog up the planet-wide domain name
tollroad.com?

Third, Internet scale demands solutions that work across=20
organizations. The very
essence is that it is not a large LAN under some mythical central control, n=
o
matter how we strive to maintain a single globally consistent name table or
network map. Explicit multilaterality is critical to the success of a namesp=
ace
on the Internet. Look for explicit delegation of portions of a namespace, as
well as a separate dimension for explicit commitment (private, experimental,
public, etc).

When we really get serious about electronic commerce, liability will accrue
around these boundaries. Who authorized overnight shipping? Why doesn't this
color match the named swatch? Whose accounting rules forgot to mention the
pension liability? Conversely, anonymity and pseudonymity are also=20
solutions for
legal liability when freedom of speech, or simply efficient auction=20
markets, are
at stake?

Postmodernist Networking

By now, dear reader, you must imagine I'm simply over the deep end.=20
Let me close
by refounding my arguments on some basic human truths. If we are=20
really going to
see a world with trillions of computers, if we are really going to establish
information and communication as fundamental human rights, we will fully
recapitulate human society on that network. David Gelernter at Yale envision=
ed
it thus in his book Mirror Worlds:

A Mirror World is some huge institution's moving,=20
true-to-life mirror image
trapped inside a computer --- where you can see and grasp it whole. The thic=
k,
dense, busy subworld that encompasses you is also, now, an object in your
hands...

It behooves us to ask how humans name the world in the first place. People, =
for
one, don't have globally unique names. Most people aren't even=20
"visible" to each
other. The UN just declared WP6B day last month: World Population Six Billio=
n.
We can't even enumerate the set of people (to say nothing of=20
devices!). And yet,
people arguably manage stable, self-organizing, extremely trustworthy
namespaces. In the ancient days of UUCP and Fidonet, even our computers got =
by
in this patchwork fashion.

Someday, DNS, IP, Ethernet, and the whole lot of centrally controlled=20
namespaces
will rot and topple over. Perhaps the new bottom turtle will be thousands of
bits of private key, infinitely messier than even IPv6 routing,=20
because it'll be
as random and as unique as DNA. But someday, it must be possible for=20
new node on
the network to be born free, switched on and still make a name for itself. I=
f
Mother Nature doesn't need a directory server=8A

In other words, in the computer networks of the future, there'll be a whole
lotta schmoozin' goin' on!