[FoRK] The World's Smallest Programming Language

Kragen Javier Sitaker < kragen at pobox.com > on > Tue Sep 12 15:33:37 PDT 2006

Dave Long writes:
> One very practical issue with combinator reduction is that it can be 
> not only exponentially larger, but also slower, than lambda or graph 
> evaluation.

Really?  I was just reading Peyton Jones' 1987 book on implementing
functional languages, and there's a chapter on SK-combinators.  He's
not very enthusiastic about them, but he doesn't say *exponentially*
larger and slower.  The basic rewrite rules for compiling the lambda
calculus into SK-combinators (here \ is lambda):

  T[ \ x . x ] = I = SKK if you want to be really minimalist
  T[ \ x . y ] = K y
  T[ \ x . \ a . B ] = T[ \ x . T[ \ a . B ] ]
  T[ \ x . A B ] = S (T[ \ x . A ]) T[ \x . B ]

which is that you go in and tack on another level of S or K for each
level of lambda-nesting, which takes O(n^2) at compile time in levels
of nesting, and also means that every original expression gets this
O(n) slowdown.  But that's not *exponentially* larger and slower, is
it?  And I have a hunch that you can eliminate that source of the
slowdown entirely by first lambda-lifting, compiling to
supercombinators, and then tupling their arguments.

He explains that you can ameliorate this problem with B and C
combinators, which are kind of like S:

  K = \ x . \ y . x
  I = \ x . x
  S = \ f . \ g . \ x . f x (g x)
  C = \ f . \ g . \ x . f (g x)     (usually called "compose")
  B = \ f . \ g . \ x . f x g

The idea is that you don't pass variables into expressions where they
aren't needed, so you use

  T[ \ x . A B ] = S (T[ \ x . A ]) T[ \x . B ]  if x is free in A and B
  T[ \ x . A B ] = C T[ A ] T[ \x . B ]          if x is free only in B
  T[ \ x . A B ] = B (T[ \ x . A ]) T[ B         if x is free only in A

And the case where it's free in neither only occurs at the top-level,
and only if you wrote the program that way in the first place just for
fun, so it doesn't really matter what you do there.

(I think there's an interesting connection here with strictness
analysis and linearity, if you use the B and C combinators instead of
S whenever you can, but I'm not sure I understand it yet.)

> Maybe I just need to read the speed prior stuff more closely, but it
> seems that if algorithms jump drastically between big-O classes
> depending upon whether they're expressed with naming/sharing or not,
> it would affect which programs were chosen by a speed prior.  Or
> does time expand pretty uniformly across codes?

I'm not sure what you mean, exactly.  Can you give some examples?

> Depending upon how much you care, this might be a question for kc-list.

What's that?  I'm starting to care a lot, because it looks like
SK-combinators are dramatically easier to implement than fully-lazy
lambda-lifting and supercombinators and so forth, although Peyton
Jones doesn't like them because (he says) it's hard to optimize the
results of the SK-transformation much further, since the evaluation
has been broken down into so many tiny steps.

They look like the simplest way to implement a functional language
with reasonable efficiency, although anyone who reads kragen-hacks
knows that I've implemented several functional languages without
reasonable efficiency.

More information about the FoRK mailing list