[FoRK] Programming languages, operating systems, despair and anger

Stephen D. Williams sdw at lig.net
Thu Nov 12 17:30:00 PST 2009

Jeff Bone wrote:
> Benjamin writes:
>> This is a key point. Many of the problems we face in writing software 
>> are integration, not simply automation.
> Actually, that is THE point, the whole point, and nothing but the point.
> Eric Meijer realized this, cf. the paper I referenced earlier.  MOST 
> programming these days is pipe-fitting between different data models, 
> communication models, and data types.  (And without any real standards 
> about those pipe fittings, but that's a different part of the 
> picture.)  MOST OF IT.  ALMOST ALL OF IT.  And most of *this* happens 
> at the lowest, gorpiest, ugliest level possible.
> Markup doesn't help, it hurts.  YAML, semi-structured text like 
> various Wiki markups, etc. are an improvement for some things.  JSON 
> is an improvement for what it does, but it's still world-of-strings 
> wrapped in maps.
> One *major* impediment to all integration tasks is this:  when you're 
> passing your data between components, you generally either (a) are 
> trapped in a very type-specific and strict implementation regime (RPC 
> stubs, CORBA IDL, and now e.g. Thrift and Google's protocol buffers 
> are attempts to resolve the problem, not very successfully IMHO as 
> they imply a kind of development cycle involving a lot of static, 
> non-interactive crap) or (b) you're marshalling and unmarshalling 
> strings, parsing files that are strings that encode data structures 
> that embed strings ad infinitum --- probably with lossy semantics.

I've been thinking of this stuff for a long time.  esXML / esDOM was one 
of my attempts to solve the problem for many integration cases.  I won't 
bore you with details, but I was right about a number of things.  I'm 
thinking of reworking Google Protocol Buffers in a couple ways to merge 
the ideas.

Done with libraries, but designed to be fundamental to a language.  
Basically, everything can be done dynamically, wire format is the same 
as memory format: no parsing or serialization yet little overhead (but 
some) and little wasted space.  This doesn't solve everything, however 
it would suffice for many business applications.  A high-profile 
application I helped design used this API over standard XML DOM.

However, as mentioned, we also need better data and interchange models.  
While XML and XML-like tree models (which are also like object hierarchy 
models) are nice, they don't compete with graph-based models in terms of 
flexibility and change / difference resiliency.  My current thinking is 
that XML-like (or microformat-like) data should be considered a view of 
graph-based data (RDF et al), so that both are unified to some extent.

Interesting, but old sequence of API choices:

Google APIs use AtomPub for getting and setting all data.

> Let the programmer express the common type values they want to express 
> directly, literally, in a way that makes it unambiguous;  do so in a 
> generic way that doesn't entail the kind of massive machinery and 
> process that inhibits on-the-fly interactive programming of the style 
> we used to enjoy.  THEN and only then will we begin to move in the 
> other direction -wrt- this impedance mismatch / integration issue that 
> will otherwise increasingly KILL forward progress ENTIRELY.

We need a lot more direct expression with a lot less plumbing.  A 
library of data / application situations and one or more of the simplest 
possible pseudocode / nirvana language code snippets would be useful. 

person p; p.FirstName="Bob"; p.LastName="Smith";
msg b { type="newCustomer"; p; };
status={b.put http://example.org/application; };

How about some actual examples of your ultimate data formats?

Besides general verboseness and maximum expressiveness, what are you 
optimizing for in particular?

Perhaps pluggable data representation parsing modules would allow enough 
arbitrary in-place data formatting.

In addition to the things-that-should-be-straightforward list you gave, 
we need these at least:

We need to be able to represent:
hierarchical data members (instances of object hierarchies, XML, JSON, 
various other examples}
graphs, including all degenerate formst such as lists, and a superset of 
RDF-like knowledge graphs
tables, sparse tables
arbitrary text, including full unicode
spacial / temporal (and similar complex) data
numbers and rationals
images, diagrams, 3D models, sounds, and other multimedia constructs
efficient arbitrary data formats (genome, protein)

We need this to be clean and clear, to have concise in-memory and 
on-the-wire formatting, and to support all needed operations.  This 
especially includes minimal-agreement interchange between subroutines, 
processes, communications links, web services, database storage, queues, 
and other kinds of integration.  Versioning, transactions, and similar 
should be supported in memory, between modules or programs, and 
remotely, preferably in some unified way.
> It starts with a data language.  Lua started life as a configuration 
> file / data language, but is too impoverished to serve the real 
> purpose --- and it's still not a particularly good language for 
> interactive use.
> Rebol's got the data literals, but...  baggage.  And not good on the 
> interactivity front either.
> Sigh...
> jb


More information about the FoRK mailing list