[FoRK] Programming lang etc. (details for Stephen, comment for JAR)

Jeff Bone jbone at place.org
Fri Nov 13 12:44:11 PST 2009

Re:  JAR:  exactly.  Understood, agreed.

Re: Tom:

> Could it be our best course would be to build the things that build  
> the code, or even build the things that will then assemble  
> themselves to build the things to build the code?


And that's a big part of the motivation for better data languages,  
too.  W/o them, it's hard to get the recursion going and sustained ---  
any given technology that makes some tradeoffs (usually false, cf.  
below;  most recent example being e.g. XML) gets so baked into things  
that humans need this giant exoskeleton of tools and crap to deal with  
the mess --- which is so gorpy anyway that event the machines have  
various problems w/ it as well.  Everybody loses.

Basically there are three general use cases for such things:  human-to- 
human (either different humans or same-human, over either space or  
time), human-machine (config files, output files for human  
consumption, etc.) and machine-to-machine (most markup scenarios,  
realistically speaking;  OTW protocols and serialization formats,  
etc.)  I contend that a big part of the problem is the baked-in  
assumption that you have to optimize on one or at most two of these.   
OGDL, YAML, various wiki markups, UNIX cookie jars and record jars,  
and other examples abound to the contrary.  And the biggest problem  
faced in any of these scenarios today, IMHO, is the lack of type- 
safety in representation coupled with tenability in the reading and  
writing dimensions.  Common wisdom would have it that you can't have  
your lunch and eat it too, particularly w/ tradeoffs in parser  
complexity (as in, inherent computational complexity) --- but I think  
we've got far better potential state-of-the-art at present than we're  
seeing used anywhere...

Regarding your "Multiarity(tm)" etc...  loved it!  Thanks, Tom. :-)   
You're spot on.

Dr. Ernie writes:

> where everything is a string

Just to be clear, that's the *opposite* of what I'm talking about.   
I'd prefer an environment where *nothing* is a string (except *actual*  
strings.)  Everything's a well-typed value and *very* few if any  
interesting data types have to be "tunneled" inside strings.  But  
those well-typed values can be explicitly constructed and  
unambiguously inferred from the lexical syntax involved.

> which implies using sigils for variables

In general this isn't really a problem just with shell languages, it's  
a problem with any language that admits symbols-as-value-types.  In  
such languages you appear to have a strict choice (with a few  
exceptions, to be discussed below) --- either symbols are unquoted and  
unevaluated by default, and must be explicitly dereferenced somehow to  
get the value (if any) they might be bound to in some context, OR you  
have to quote them in order to use them as values in themselves.  (Or,  
you can punt and just have strings, which is what all but a very few  
languages do.)  For the most part you can't have it both ways.

You can get away from that in some limited context by having some  
special evaluation rules.  Schemes and all UNIX shells have a useful  
convention:  the first symbol or subexpression in an expression is  
taken to be a variable referencing (function returning, etc.) a  
function, and is implicitly dereferenced and applied to the arguments  
(which in Scheme are just expressions that are eagerly evaluated,  
while in the shell they are only lightly parsed and flat and  
dereferencing is explicit.)  But such evaluation semantics along with  
the syntactic ability to distinguish expressions / commands / whatever  
the bigger-than-word language unit is, gives you a tool to use as a  
language designer.  (Nb. note that Scheme achieves something  
interesting by allowing either a symbol or a functor in first  
position;  this leads, with a little thought, to a really interesting  
gestalt:  the semantics of programming language-style variables (named  
slots, as opposed to e.g. mathematical or logical variables, etc.) in  
general can be understood in terms of functors.)

The generalization of symbols to hierarchical constructs that are  
still simultaneously both names and first-class objects --- let's call  
them "path expressions" --- is a pretty interesting thing.  Consider  
the following in a typical object language of some kind:


This should generally be understood as "look up the value of the name  
c in the namespace obtained by looking up the value of the name b in  
the namespace obtained by looking up the value of the name a in the  
(global, local, depending on context) namespace."  Dereference,  
typically, is implicit.  Consider the similarity to the familiar


What's the difference?  Well, for one thing, in shell-like languages  
we can construct the latter, pass it around, etc. in shells w/o  
assuming that it's going to be dereferenced at any given point and /  
or yield anything particular.  To be fair it's because the shell only  
treats it as an opaque string (modulo things like dirname and its path- 
munging shell shortcut friends) but there's no reason why we can't  
think about such things as objects in their own right.

This leads to a really interesting set of potential evaluation rules  
that minimize (but don't entirely eliminate) the kind of dollar-itis  
that you find in most shell languages.  And FWIW, the first characters  
of each of these:


Can all be understood as special dereference operators that name a  
unique context in which the symbol foo is to be dereferenced.


So to be clear:  I'm *not* a fan of the sigils and crap syntactic line  
noise that you find all over the place in e.g. most shells and in Perl  
etc.  That's actually *exactly* what I'd like to minimize!  But in an  
interactive context, and with first-class symbols and other value  
types, it's unlikely that you can eliminate (at least) the use of e.g.  
"$" as a prefix dereference operator when you want to get the value  
that's "bound to" or implied by certain value-holding types that are  
values in themselves.



More information about the FoRK mailing list