[FoRK] Programming lang etc. (details for Stephen, comment for JAR)

Jeff Bone jbone at place.org
Fri Nov 13 06:42:14 PST 2009


Stephen asks:

> one or more of the simplest possible pseudocode / nirvana language  
> code snippets would be useful.

I've actually been toying with this for several years now; progress  
has stalled somewhat this year for various reasons.  Actually have  
some toy code working, but not happy enough with it yet to let it  
loose into the wild.  Biggest problem with the endeavor, really, is  
what it wants to be when it grows up:  my ultimate aim has always been  
a "new" shell language, something akin to PowerShell nee Monad (with  
little bits of es, fish, and the Inferno shell thrown in) ---  
something that would cover the 80% of the use-cases that I tend to run  
into in my everyday programming.  (I.e., you aren't going to want to  
write machine learning stuff in this, but data schlepping -- you  
bet.)  Cf. previous rants about the need for a new UNIX shell.

I started out on this course using a Kamin-derived Python language  
toolkit that I built called "polyglot" and, as per usual for these  
things, it was a kind of surface syntax over a Scheme.  But there's  
obvious problems w/ writing an interpreter in a meta-interpreter in  
e.g. CPython, so that was really just a means of playing around with  
syntax and ideas about scope, state, evaluation, and so on.  And  
somewhere along the way I decided that if you really wanted this thing  
to be a language for computation rather than just expression of data,  
you needed to basically make the shell itself a kind of meta-operating  
system on its own rather than just an interactive wrapper for the now- 
stale standard UNIX system calls and abstractions.  And at that point  
I stalled, because it's not clear what the underlying implementation  
language and runtime *ought* to be to provide enough "lift" for the  
desired level of abstraction.  (Erlang is a contender;  the JVM was  
out in my mind for various reasons.  The other possibilities examined  
included D, direct-to-LLVM, and dare I say it --- even Go looks like  
it might be useful *for this purpose.*)

And there I'm stuck at present, absent a few bits of toy code that I'm  
not really happy with.

On the "data language" front, though, let me toss out a few hopefully- 
evocative thoughts.

One approach:  take JSON and add first-class (i.e., non-string)  
"symbol" or "word" syntax and a handful of the most-useful Rebol  
literal type expressions (e-mail, uri, generic "path" expressions,  
files and so on) and run with that.  That gets you something far more  
useful for "integration" purposes.  E.g. the data bits of your  
example, slightly extended, might be (hope formatting is preserved  
enough not to mangle.)

Person {
  %111-222-333-OID  # note, an ordinal rather than named value
   messaging/type NewCustomer
   firstName "Bob"
   lastName "Smith"
   email bob at foo.com
   age 23 years # a "quantity"
   height 5 feet 11 inches # ditto, a complex quantity expression
   homepage http://foo.com/~bob
   homedir /Users/home/bob
   shell /bin/mosh
   nodes/computing
     a.foo.com, b.foo.com
     c.foo.com
   nodes/files
     d.foo.com
   lastLogin 2009-11-12-09:02:33-CST # ~ISO, whatever
}

A few things to note.  The use of binding or assignment operators like  
'=' or ':' can be avoided in most cases given some "lifting" of  
evaluation semantics from whatever eventual embedding you might want,  
and all this works just fine in a data-only declarative scenario.  The  
non-quoted values have their own types inferred from their lexical  
syntax.  You can even lose the curlies around the entire chunk --- cf.  
OGDL (ordered graph data language) for a very nice and generalized  
implementation of a lightweight markup-like language capturing some of  
these ideas.  (OGDL proved to me that *most* of the nesting and graph  
constructions you want can be obtained without having the usual, dizzy  
confusion of [{(<>)}] and similarly .,; terminators.  You can build  
fully general ordinal data frames that can encode general graphs using  
nothing more than labels, whitespace, and very little punctuation.  Cf.

   http://ogdl.sourceforge.net

You can punch things up a bit by making the generic container a Lua- 
like table;  Lua's also got a nice idiom that is *actually* an  
alternative application syntax (implying you no longer have a data  
language) but which, when you squint, looks like Haskell labeled- 
record constructor syntax.  That yields the syntax above where it  
appears that you're either labeling a record type or constructing an  
object by passing a Person constructor a data frame or table full of  
key-value pairs.  But you don't really have to define these semantics  
as a straightforward and fully-declarative semantics is clearly  
possible.  (You just want to think about such things carefully if the  
goal is to make the data language first-class embeddable in an  
interactive or other programming language, sans some byzantine quoting- 
and-escaping convention.)

Ultimately the data model has to be fully-general graphs, ala RDF or  
its generalizations (particularly cf. FluidDB, but think non- 
centralized) but none of the existing implementations are satisfactory  
from a surface syntax perspective and the strict 3-tuple suffers form  
certain known limitations exacerbated by any of the various proposed  
serializations.  You really need a bit more sugar up top to make it  
practical for certain real-world things (like attribution or attaching  
metadata to things.  FluidDB puts a stake in the ground about how to  
do that sort of thing.)

Beyond the data language, what I'd really like in the full shell  
language and runtime environment are pretty straightforward:

   - first-class functions (a rarity in shells) with real lambda /  
closures
   - structure-preserving (but *not* strongly-typed) pipelines (cf.  
PowerShell)
   - built-in pico-weight processes (ala Erlang)
   - built-in "transparent" asynchronous distributed messaging
   - possibly first-class transactional environments, ala Alan Kay et.  
al.'s "Worlds" idea
   - potentially process migration, particularly when  integrated with  
the previous
   - potentially first-class continuations (interesting in above two  
contexts)
   - selective namespace sharing (generalization of Plan 9 etc. per- 
proc namespaces)
   - must have in-place code replacement, CRAN / CEAN repo /  
distribution, etc.

Per the Worlds idea, it has the potential to be an *extremely*  
powerful operating system structuring concept, especially if coupled /  
integrated with first-class and portable continuations.  Consider the  
possibilities if such a thing were extended beyond simply the  
variable / object graph namespaces to e.g. the file system.  (To do  
that right and in a dispersed-friendly / portable (in the sense of  
mobility) way, you have to decouple the namespace from the storage  
mechanism and you have to make stored objects immutable and  
replicable.  But hey, there's only about two dozen "p2p" type systems  
out there that have done that... ;-)

> Perhaps pluggable data representation parsing modules would allow  
> enough arbitrary in-place data formatting.

Just FYI, Rebol has *exactly* that.  It calls this concept "dialects"  
and basically allows unparsed blocks-of-words and values to be  
processed however you like;  this is its basic facility for DSL  
construction and its used to great effect by various of its toolkits,  
such as its GUI toolkit.  (Building GUIs in Rebol is a largely- 
declarative matter of writing configurations in this dialect and  
binding to behaviors defined in the full language.  Cf.

   http://rebol.com/r3/docs/concepts/parsing-dialects.html

JAR says:

> synchronization is free

I have generally been agreeing 100% with everything JAR's had to say  
for quite a long time, but it's worth throwing this slight objection  
out there on this one...

I'm guessing I have a pretty good idea of what environments you're  
working on which make this a practical assumption and it's worth  
pointing out that this is a special case;  in general / in the limit,  
synchronization cannot be free, in fact it is *impossible.*  Cf.  
previous thread re dispersed vs. distributed;  parallel is a special  
case of distributed in strictly the same sense that distributed is a  
special case of dispersed.  With each specialization comes various  
constraints and, if your operating environment meets those  
constraints, you can make some assumptions as you note.  Otherwise,  
special relativity will get you every time;  and note that this also  
has hard implications for any general theory of intelligence or  
computability, much less specific models of computing.

I.e. "shared-nothing" isn't just a good idea --- it's the *law.* ;-) :-)

There may be a more general notion of synchronization --- let's call  
it "rendezvous" --- which is generalizable and physically tenable.  I  
haven't seen that pursued very far outside of early mobile agent stuff  
(*that* trope / red herring again?!?  well, maybe it's got legs after  
all...) and, at that time, it wasn't being looked at as an essential  
element of any generalized theory with any connection to real-world or  
physical reality / constraints.


jb





More information about the FoRK mailing list