[FoRK] the fix is in? <rant> lang, os, etc. continued </rant>

Jeff Bone jbone at place.org
Fri Dec 11 19:13:39 PST 2009


Mr. Sean Conner(y) suggests:

> Lua



Absolutely a contender for my own pet shell and related automation  
projects.  Very nice, very small and tight, properly tail recursive,  
has good coroutine support at VM level,  (though lacks first-class  
continuations, perhaps not a problem given the coroutine  
implementation, cf. HoPL '07 paper etc.), has a nice packaging system,  
gets a lot of things right, fast-ish...  and it has the purtiest FFI  
since Elk.  IMHO, of course.  Downside:  minimalism leads to lack of  
expressivity leads to higher LoC leads to lower readability and  
unnecessary complexity, take a look at e.g. some of the wiki  
implementations in it, e.g. Nanoki.  Hardly nano...  Suffers a bit  
from the lack of standards for modules, object system, etc. --- like  
lisps in that respect, or Tcl --- a little too meta for its own good,  
sometimes constraints can be empowering.  Has some momentum by virtue  
of its de facto standard in the gaming community;  obviously less so  
than js, but still fairly reasonable.  Spent a good several months  
playing with it intensively;  prior to 5.0 it posed some difficulties  
that have now been addressed, but I haven't picked it back up  
seriously lately.

For the record, here are the substrates / targets and approaches I've  
actually done more with (i.e. in the "new shell / user environment"  
space being discussed) than just considered at this point, over the  
last 10+ years:

(1)  roll your own in straight ANSI C.  Loses because of lack of  
"draft-ability" on other efforts.*
(2)  a Scheme, stock or roll-your-own, port Termite, build on that.   
Probably loses, too many Catch-22s.
(3)  Regular Python.  Great for prototyping, loses for GIL,  
performance, and other reasons.
(4)  Stackless.  Actually a contender, but suffers for same reasons as  
stock Python sans GIL.
(5)  Erlang.  Problem there is FFI...  also, very static, impl.  
unwieldy.  Needs a decent shell, though.
(6)  Lua.  Also Metalua.  Still strong possibilities...

Things I've considered but haven't really actively pursued too far for  
various reasons.

(7) various Occam-pi solutions, including TyCo and Transterpreter  
(sexy, baby!)
(8) LLVM.  Too compiler-centric, would need extensive runtime support.
(9) JVM / CLR.  GMAFB.  Bloat city, just don't like 'em.

Other things I've at least entertained conceptually for at least a  
little while lately include Go (no kidding) --- problem there is *they  
don't carry their own chosen abstractions far enough* --- i.e. if  
you're really itching to do pi-calculus then *just do it* --- having  
to jump out of the abstraction to do e.g. RPC is just silly.  These  
guys *invented* the "few high-level abstractions, uniformly applied"  
idea but apparently they lack the strength of their convictions when  
it comes to the process calculi their work rests upon.  Hell, if  
you're going to talk the talk, walk the walk.  Why stop in 1985 when  
you can sprint all the way to 1993?  That's irritating enough to put  
me off even attempting to use it, just on principle.

Have also burned a few days thinking about e.g. node.js, er.js  
(incompatible due to version support of former vs. requirement of  
latter) and friends, and various Prologs have tempted from time to  
time.  (Prolog as a language-implementation-language *may even beat  
lisps* in various cases.  If you're curious, ask, I'll point you to a  
couple of existence proofs.  An impressively-capable though toy Algol- 
like language interpreter can be had in literally a few 10s to 100s of  
lines of Prolog;  and some Prolog implementations offer nice substrate  
support for various concurrency experiments.  Compare and contrast to  
similar language implementation in Scheme in recent editions of EoPL.)

To date the most progress, and the testbed for all the ideas, has been  
Python (regular or Stackless, or Logix language toolkit --- or my own  
rather pathetic port of Budd's OO version of Kamin's interpreters;   
Pascal to C++ to Python, ugh.)  But I have no illusions that that's  
anything but a throwaway, a sort of language laboratory "whiteboard"  
substrate.

Erlang or Lua are probably the top substrate contenders if I ever get  
around to doing anything beyond monkeying around with ideas, with  
ground-up roll-your-own in ANSI C being a close third.  In all  
seriousness, though, non-browser js does have its attractions.  (I  
about gacked when I saw home many LoC in C++ there were in node.js +  
V8 + the additional dependencies, though.  About 240k > than what I  
want to deal with.  Not that Erlang's any better...  ;-)


While we're on the topic, here's an interesting new language that  
actually gets a lot of things right on the kitchen-sink "systems  
programming language" front:

   http://fantom.org

Their concept of actors --- async invocations returning promises /  
futures --- is seriously flawed, though.  But the language as a whole  
and its libraries, documentation, etc. --- the "whole enchilada" ---  
has the "feel" of being well thought-out and well-crafted, though.


Random thoughts...

jb

* PS - i.e., the goal is to be in the shell business, not the VM  
business.  But if you go the straight interpreter route, then you lose  
some nice abstraction boundary and portability properties...  OTOH,  
this (bare-bones ANSI C) is probably the least-risky choice from an  
adoption perspective.

PPS:  to be clear, there are two general use-cases that I'm interested  
in;  two general templates for my "everyday computing environment(s)"  
that I work within and move freely between daily.  These are:

   (a) typical highly-coupled mostly-homogeneous large-scale cluster  
computing with large data
     - order of 100s to 1000s of CPUs
     - terabytes of chunky, mostly-homogeneous (few kinds of) data
     - gig+ highly reliable redundant networks
     - generally compute- and IO-intensive problems
     - generally embarrassingly-parallel problems

   (b) very loosely-coupled (dispersed), highly heterogeneous,  
personally-focused computing
     - lots of little, some big, no huge data --- but highly  
heterogeneous in model and form
     - dispersed:  high standard deviation of latency, frequent  
disconnection
     - order of dozens of devices, very heterogeneous
       - controller-scale devices
       - plug servers (ala Sheeva / Tonido)
       - special-purpose shared devices (readers, touchpads, etc.)
       - personal end-user IO and compute devices (desktop, laptop,  
etc.)
       - personal end-user O-only and O-mostly devices (think TV...)
       - local and remote, often specialized servers (home media  
server, remote mail server, etc.)
     - lots of different applications and automation scenarios
     - generally coordination is a bigger problem than compute or IO  
intensity

The desire is to be able to treat each of these *sets of devices*  
conceptually as *a single programmable unit* by an individual user  
(e.g., me.)  (We punt the administration boundary problem;  each is  
deemed to be contained *within* a single administrative boundary, but  
implementations in the case of the latter at least must take care to  
implement enforce that boundary appropriately around the set of  
computational resources involved while crossing external boundaries.)

The thesis is that, with proper selection of abstractions, the *same  
set of abstractions* (universally / uniformly applied) works for both  
purposes.  The product of this  would therefore be an implementation  
of a distributed computing environment that spans both and provides a  
high-level, interactive, shell-like experience for *programming* both  
sorts of environments at the same level of expressivity and efficiency  
as e.g. existing shells do on UNIXen.  The proof in the pudding would  
then be that I could work exclusively within that meta-environment for  
most purposes rather than having to lash together a bunch of crap  
using dissimilar tools for each of these;  the "right tool for the  
job" proponents could then be clearly shown the Hobson's choice  
inherent in their limited view of the possibilities...

Whether this would have value for anyone else is beyond the scope of  
what I'm interested in doing...

The general shape of the solution is this:

   - start with a rich-data language for IPC (think JSON, OGDL, or  
similar with enriched types and symbols)
     - for integration purposes, mappings to some / all of:
       - HTML (output-mostly)
       - JSON (full round-trip)
       - XML (full round-trip)
       - possibly protobufs, Thrift, etc. (how to make reflection  
feasible?)
       - potentially others (texts -> some wiki markup;  etc.)
     - (some of) these mapping must have transitive closure
       - for some source data language S and target T and transforms  
x, x', y, y' etc.
       - x.S -> T;  x'.T -> S --> x.S | x'._ -> S
       - note intermediate forms are desirable  x.S | y._ | y'._ |  
x'._ -> S
     - path of least resistance might be:
       - extend JSON with first-class rich type literals, e.g. at least
         - URIs
         - quantities (numerics with dimensions, e.g. "5 joules")
         - symbols per se
         - possibly first-class node labels / references
           - consider Haskell data / labeled record declaration syntax
           - consider Lua table / method sugar:  T { foo = 1 } ...
       - possibly some additional syntactic sugar to make less onerous
   - build a shell around that
     - i.e., functions / filters / processes / commands / pipes /  
composition
     - seamless integration of in-node micro-processes and external  
processes
     - structure-preserving pipes and richer pipe wiring
     - runtime provides:
       - configurable universal resource namespace
       - local and remote access to resources via generic interfaces
         - native devices and services
         - synthetic devices and services
         - need less statefulness than 9p;  consider Octopus op
         - nb. op <~~> REST (!)
       - a decent distributed data store (implicit versioning,  
replication when appropriate, etc.)
         - cf. Amoeba Bullet filesystem, SOAP (not that SOAP)  
directory services, etc.
         - cf. OceanStore, etc.
         - cf. Lamport's, Cox's etc. research on distributed  
synchronization
         - cf. operational transform (OT) theory and patch theory,  
time / state vectors, etc.
       - mobility solution TBD.  Mobile u-processes or reflect /  
reify / migrate the entire node?
   - use that to build as much of an end-user environment for the use  
cases as possible
     - need task-farming / process mobility etc. in the cluster case
     - need universal eventing, coordination, code swapping, etc. in  
the dispersed case

Been sort of blocked (pun intended) on the idea of trying to define  
the shell's semantics as a block-reduction semantics with PAR, SEQ,  
ALT, and friends from Occam for a while now.  Getting a bit academic;   
starts to look like Piccola's form algebra, or in more generalized  
form the ambient calculus.  May simply punt on that;  defining a  
stream algebra and stream combinators is trivial by comparison, even  
though it leaves something to be desired in definitional rigor.

A couple of key requirements:

   - the meta-computer should be trivially extensible with / to new  
hosts and devices
     - "inject" / install / execute a single, small code unit to join  
into cloud
     - bootstrap from there into full environment
     - for tiny nodes (controllers, sensors) --- two options
       - host complete but minimal runtime environment (implies other  
requirements, ~prohibitive)
       - possibly two (or more) distinct roles hosts can play e.g.
         - "full" node vs.
         - merely sensor / controller "device" nodes
         - potentially others:  storage, compute, console, etc. nodes
       - bigger issue for the "personal cloud" case than the generic  
cluster case

   - the meta-computer contains a single administrative boundary from  
user perspective
     - but must smoothly cross certain "external" administrative  
boundaries
     - network barriers (firewalls), etc.
       - when appropriate
       -not suggesting unauthorized circumvention)
     - user environment / view of the world extends across these  
"translucently"

(One last, last thing to note:  the resemblance of my desiderata to  
what you see with e.g. botnets is not accidental.  A white-hat variant  
of the botnet concept is actually a pretty good model for end-user  
computing in the scenarios I mentioned, and the use cases are very  
similar, particularly (b), with the important exception that the  
motivation is to merely make use of one's *own* disparate resources  
more effectively.  Existing botnet technologies fall short of  
providing the kind of seamless global environment desired, though, and  
are pretty much a kitchen sink mishmash of crapware you'd never want  
to try to use and without any abstraction away from pure, native, if  
remote / distributed command-and-control.  Wasp and it predecessor  
Mosquito being, perhaps, the exception (and not enough of one.)  Admin  
tools like puppet and chef are, as previously mentioned, a lot closer  
on some levels, not on others.)

(One last, last thing to note:  was deeply into in the mobile agent  
thing back when that was tres cool, circa General Magic era.  Gave up  
on it back then as a solution in search of a problem;  agents for mail  
clearly trumped by my own then-company's standards-based e-mail  
product.  Have identified two problems for which it may be *the*  
solution over the last decade.  First, in the dispersed case, mobile  
computation is generally going to be the lowest-friction solution to  
certain problems (i.e., programming a limited controller or sensor  
edge-node, particularly one with intermittent connectivity).  And in  
the large-cluster / chunky-data case, it's nearly an absolute  
necessity to take the computation to where the data is...  These two  
recognitions have forced me to really move towards considering higher- 
order pi-calculi and friends, and even dabbling with the ambient  
calculus a bit.  Ambient in particular is really interesting, though  
Cardelli can be a bit formal and inscrutable (on this or any other  
topic) and it's a bit early-days in that regard, probably too big a  
mindset shift to make a practical solution out of, really.  FWIW, we  
dabbled with something so close to the ambient calculus that it was  
indistinguishable back at Active* --- w/o the formalism, of course.   
That was mostly Sean M.'s doing, though, props etc.)



jb






More information about the FoRK mailing list