Data, the Reverse Salient of Software.

I Find Karma (adam@cs.caltech.edu)
Tue, 17 Feb 98 02:54:24 PST


Rohit and I have been trying to make sense of the general confusion
unleashed by the demons of Crook on the apostates of the Church of
Objectology:

http://xent.ics.uci.edu/FoRK-archive/feb98/0139.html

Doug Lea kindled the flames in

http://xent.ics.uci.edu/FoRK-archive/feb98/0184.html

with:
> Two things I (quite seriously) do not understand:
> 1. How it can be that people do or do not like objects.
> [...]
> 2. How it can be that XML and other non-computationally-complete but
> useful languages somehow serve as alternatives to objects.

On the first point, we agree that there is nothing to like or not. Crook
was correct to point out that good design transcends implementation
technology, and that technology only loosely guides good design.

Personally, the central novel consequence of objects is an economy of
component software. For better or worse, objects introduce a new scale
for selling, and hopefully, buying (by end-users, not as developer
components). That is what people do or do not like: the jury is still
out on whether it will encourage more or less coherent, more or less
profitable software.

But as a matter of taste, we can no more explain how people fail to love
OOP (encapsulation and inheritance) any more than the XX chromosome
population can fail, en masse, to love Rohit. :)

The real case we'd like to discuss orbits Doug's Point #2. Our
hypothesis is that data is becoming the reverse salient of intelligent
systems, and hence deserves a rightful place at the side of component
software.

===========================================================================

"Reverse Salients" emerge as lagging fronts in a broad advance. In war,
it is a line of soldiers falling behind the main front. In chemistry,
it is a limiting reagent. In technology, it is a design regime that
becomes the font of innovation as its performance or features lag the
rest of the technologies in the product.

In automobiles, the initial goal was a reliable, lightweight internal
combustion engine, which lasted up through the 1920s. Once there was
adequate power (the Ford V-8) and infrastructure (parkways, early
motels), the limits of the product became touring comfort. Hence the
1930s shift to transmission innovation and the closed steel body -- two
reverse salients.

In electrical power, distributing high-amperage electricity required so
much copper and safe infrastructure, that Edison recognized the reverse
salient in lighting technology -- and decided it would be easier to make
a lower-power lamp than debug high-power distribution.

In computer/software systems design, we believe that software's reliable
half-life continues to decline. As programs become more ephemeral --
with almost annual platform changes on the desktop! -- data become
longer and longer lived. We believe that data architecture is an
emerging reverse salient in dynamic, interorganizational component
software systems.

===========================================================================

Before we dive into reverse salients some more, we address some easy (?)
answers:

Ron Resnick in

http://www.infospheres.caltech.edu/mailing_lists/dist-obj/msg00930.html

wrote:
> Here we were just discussing above how hard it is to build a useable
> framework that keeps non-sophisticated developers on the straight and
> narrow. That's hard enough to do when you have the BEST tools at your
> disposal - multicast, objects, composable protocols, all the
> goodies. Trying to reach the same goal by using crippled tools like
> http and cgi ... is like handicapping yourself

MYTH #1: The Web (as programmable platform) succeeded because of CGI.

TRUTH: Forms are relevant, not CGI. HTML FORMs are the real contribution
of the Web to software module interconnection. True, we speak of the
most primitive name-value association as arguments, but when paired with
a declarative user-interface, it is more than powerful enough to allow
use -- from any media platform, from graphical desktop to text to fax to
cellphone -- and reuse -- from automated web reporting tools. True,
forms have a long way to go to add strong typing; see our notes at:
<XXXX> . However, CGI as a posting target is truly and deeply
irrelevant. POST simply means to "add this state to the resource at U"
-- the how may be a CGI, a server api, a database, or manual monkeys; it
is completely opaque.

Tying back to our discussion of UI and FORMs at

http://xent.ics.uci.edu/FoRK-archive/jan98/0256.html

we consider the Web to be in an uphill struggle to add typing to untyped
arguments -- the reverse of Pedro Szekely, et al.'s MASTERMIND approach
of tagging CORBA IDLs with the generic type of UI widget to edit each
data type:

http://www.isi.edu/isd/Mastermind/mastermind.html

Ron continues:
> I've asked you, Adam, numerous times how you think http can be evolved
> into something like iBus, or Dan's UberNet, or BAST. For a while you
> were claiming the answer was PEP, but last time we had this
> discussion, I think we agreed that PEP gave you service-composition,
> not protocol composition. PEP still gave you no way to automagically
> turn http into a multicast service, right?

MYTH #2: Transport mechanisms are relevant.

TRUTH: "Service composition" like security, compression, and accounting
is as far as the application-layer designer ought to venture.
Bit-delivery channels are irrelevant. The designer says what message,
and to whom, and with what reliability guarantees, and the middleware
layer should isolate lossy UDP vs. stream TCP vs. packets (records,
frames) within TCP; also, unicast to app-layer reflectors (SMTP,
PointCast) vs. multicast to routing-layer reflectors (MBONE).

REMEMBER: **Multicast is only a performance optimization.** Except when
used on a single shared physical medium, it does not function as an
arbiter. Thus, any multicast group is strictly isomorphic to a network
of application-layer objects through coordinating directory points.
Object groups over multicast merely punts group membership calculations
to the routing layer. And there is no practical general solution to
wide-area reliable multicast; such problems devolve to pairwise reliable
transfer between local exchange points (SMTP redux).

[nb. what ron calls "protocol" we would prefer to call bit-channel]

Ron continues:
> Well, I rather thought the philosophy was of *documents* being passed
> around a network... I don't know anymore. That whole
> documents/objects thing has me a bit lost of late. Rather like we
> spout it like the secret-password to get into some inner club of
> people who "get it". Well, maybe I don't anymore. I mean, if you want
> to define "document" to have exactly the same meaning that "software
> object" generally has, sure, they're equivalent. But that's not really
> what's going on. Behaviour isn't passed around as objects on the Web
> (with the not-very-interesting exception of things like applets and
> ActiveX). Behaviour is passed around as CGI requests, which isn't
> really native to the web, but ancillary - certainly not a "underlying
> philosophy".

MYTH #3: The Web is not an object system.

TRUTH: The Web was *intended* to be and *is* an object-oriented
communication system. HTTP provides a dynamic invocation interface, but
recall that The Classic Web Application (TCWA, from the HTTP-NG lingo)
is aimed at document objects, which can be retrieved with GET or sent
with PUT or POSTed to (& with DAV's extensions, PATCHed and LOCKed).
However, we note that there is a deeper duality at play here, too.

At the next layer fo reflection, the Web is an object transport system,
too. It's just that behavior is simply not observable (in the sense
that behavior can neither be stored nor exchanged): only the input and
output entities of a resource are observable for storage and exchange.
Only the "shadow" snapshots of objects can be seen.

In the abstract, POST means "add this bucket of bits to the state
already at U." POSTing to a newsgroup adds a message; POSTing to a UNIX
executable sets some environment variables (CGI); POSTing to a server
API passes the body on whole. The bucket of bits is not neccessarily a
document, HTML or otherwise, but recall, the only observable steps are
the HTTP transfer of a bitbucket -- an HTTP entity, hence a document.

This is not the same as pass-by value *within* a distributed system all
written by the same author, where the layout of an XDR or Q message is
already known to all parties.

Imagine the philosopher of science when faced with an RPN "calculator"
URL. POSTing a computation seems to result in another artifact
corresponding to the answer, but the process doing the computing is not
visible. There might be a separate URL to reflect, and ask the server
for the calculator implementation, but intent is not inherently
observable. The philosopher can mainly look at the last million runs
(Plato's shadows on the cave wall) and suppose the next one may work too
(without ever truly knowing what is casting those shadows).

===========================================================================

Now, from Ron to Doug

http://xent.ics.uci.edu/FoRK-archive/feb98/0219.html

in which Doug wrote:
> Patrick Logan:
> > Some people view "objects" as being not much different than data
> > modeling or knowledge representation. Maybe that is related to this
> > point?
> Yes. Data-model objects are the most degenerate kinds of objects.
> They scarcely seem like machines at all. Which leads people into
> dumb-data + smart-code approaches, which historically haven't often
> been a big win. But with several notable exceptions -- sometimes it
> does pay off to treat data-model objects as glorified memory cells.

MYTH #4: Code is expensive and data is ephemeral.

TRUTH: Historically, it has been a big win to write a suite of programs
consulting a common data format -- unix admin tools and /etc/ files, 4GL
apps and databases, email processing tools and 822. Now, that is
possibly a fallacy of grain. We think Doug meant *within* programs, not
necessarily between components. Even within objects, though, we think
the next extension of atomizing machine behavior into smaller machines
(objects) is to recursively decompose them into machines and tapes. For
data in-core, the working set, this may be overkill, but as platform
half-lives collaps, externalized data lasts longer and longer by
comparison.

Software engineering has made it easier and effective to build small
components and compositional systems. Massive database packages and Web
browers and ERP systems become platforms in their own right. Unlike
hardware or OS generations, which have changed at a frequency
proportional to a single key frequency (chipsets, APIs), software
platforms change at the union of all their components' frequencies:
faster and faster as the number of constituents increases.

This is a reasonable approach, because one of the best ways to build a
reliable system from unreliable parts is to use more of them. Isolating
complexity -- say, RealAudio from .WAV from sound driver -- is cheaper
and easier to diagnose than writing a monolithic audio subsystem.
Runtime software evolution can replace components or merge them later.

This series of insights sprang from considering our own personal
information spaces. The generations of word-processors, email user
agents, and other applications have been getting shorter and shorter,
but the externalized data (messages, papers, dates) need to be
accessible further and further into the future. There are also synergy
benefits to diverse tools on common formats. Email, again, is an ideal
example: indexers, sorters, archivers, and so on.

Now, we grant that documents are NOT, in themselves, objects. They are,
as discussed the FoRKpost about compatibility standards, the creme
filling:

http://xent.ics.uci.edu/FoRK-archive/feb98/0138.html

Documents best capture the STATE of a system, not its behavior and rules
of evolution. There are no inherent conditionals or looping constructs
in XML -- XML is a declarative data container, not a programming language!

However, there are economic benefits to speaking of documents as the
objects in the long-term. Behavior may change, but the data files often
last longer. Data files also have multiple uses; viewed in different
ways by different subsystems. Rather than fragmenting relevant detail
amongst many leaf classes, there are maintainablity benefits to bringing
all customer contacts to a customer record, for instance.

There was another claim, made by Mark Baker, that XML was well-suited as
an IDL replacement:

http://www.infospheres.caltech.edu/mailing_lists/dist-obj/msg00731.html

While XML can be a concrete syntax for IDL, that is far from its only
use. We would like to see more use of XML to capture the thing itself,
not just the interface to the program which manipulates the thing.

===========================================================================

Platforms and software now erode rapidly.
Half-life of software is getting shorter.
Half-life of data is getting longer and they seem to have crossed.
Interorganizational coordination is easier
to arrange around data than around behavior.
Actual code used is ephemeral.

[Haiku compression of above, anyone?]

===========================================================================

Finally, a word or two on the hype behind XML. There's a lot of it. We
contribute to it as often as we can. We think it's important to get the
word out to the 97%. But inside the XML tent, we think there are some
real limitations ahead. The "engineering compromises" Doug speaks of
have yet to get really heated indeed. There's an immense amount of work
that went into just getting XML 1.0 out the door in compliance with
SGML86.

Why Markup Languages at all, though? YML? That's the subject of
another post, asking what XML is and is not best suited for as a
concrete syntax for data formats. We want to preserve the XML object
model -- its schema (document type definitions) -- while reconsidering
how files are laid out in space and time. More on YML later...

But for now at least, the Object Emporer has new clothes to wear.

-- Rohit and Adam

----
adam@cs.caltech.edu

Plato believed that everything was an imperfect copy of one true object.
The chair you're sitting in, for example, would be an approximation of
the one true Chair which existed in some ethereal plane of existence.