Composing language descriptions: tree automata and language design?

Dan Connolly (
Fri, 27 Feb 1998 22:31:49 -0600

This is a multi-part message in MIME format.

Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Ever since all this XML namespace stuff started, I have
been saying

Context-Free Grammars are Very Well Studied,
and basing standards like XML on them is a safe bet.
(Similarly for RDF based on AI frame/slot/value)

But you can't just take a CFG from Joe and another
from Bob and slap them together. At least I haven't
seen any well-studied way to do it.

Then Murata Makoto sent a pointer to this little paper:

% Forest-regular Languages and Tree-regular Languages
% MURATA Makoto
% May 26, 1995

He also posted a small example, but since it went to xml-wg,
I won't forward it. A larger example is available in:

I'm starting to get a "where have you been all my life?" feeling.

Am I the only person who's been studying language design for
the last few years who hasn't studied tree autonoma?
I surfed around in comp.lang.* last night just to sanity-check.
Nope. No sign of it.

It looks really cool. It looks like a good model for
adding the equivalent of Modula-3 generics[3] or maybe even ML
modules (no citation--I haven't studied these closely) to SGML.
But I'm not sure. I don't quite have my head wrapped around
it yet. I could use some help.

(but to do it justice, you need to read chapter 8,
"How the Language Got its Spots" from
System Programming with Modula-3 Edited by Greg Nelson Prentice Hall
Series in Innovative Technology ISBN 0-13-590464-1 L.C. QA76.66.S87 1991

I can't find a copy on the web, but it's widely cited; e.g.

The classicist in me had just about given up hope of finding
a good model for composing DTDs before I saw this. I thought
we were going to have to just swim around in loosely defined
XML-based languages (where the instances were well-formed
but couldn't be machine-checked beyond that) for some time.

Rohit... have I piqued your interest? Can you spare
the time to study this forest autonoma stuff to see
if it fits into the ecology of knowledge representation
in the Web?

Anybody else?

Dan Connolly

--------------36F969537A95 Content-Type: message/rfc822 Content-Transfer-Encoding: 7bit Content-Disposition: inline

Received: from ( []) by (8.8.7/8.7.3) with ESMTP id DAA01947 for <>; Fri, 27 Feb 1998 03:22:14 -0500 (EST) Received: from ( []) by (8.8.5/8.7.3) with ESMTP id DAA07646 for <>; Fri, 27 Feb 1998 03:22:14 -0500 (EST) Received: from shoal by (8.8.8/jump.1.11) id CAA29574; for <> Fri, 27 Feb 1998 02:24:00 -0600 (CST) Message-ID: <> Date: Fri, 27 Feb 1998 02:22:50 -0600 From: Dan Connolly <> Organization: World Wide Web Consortium ( X-Mailer: Mozilla 3.01Gold (WinNT; I) MIME-Version: 1.0 Newsgroups: comp.specification.larch CC: Subject: ZF Replacement schema in larch? Content-Type: multipart/mixed; boundary="------------1DB9750A2299"

This is a multi-part message in MIME format.

--------------1DB9750A2299 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit

I'm having trouble tranlating the idiom { x: P(x) } into larch. Any clues? I checked the larch FAQ with no luck.

The actual example I'm after is more like:

Dom(u) = {1} \U {1.v1.v2...vk : v1.v2...vk \in Dom(v)}

This seems to provide a clue:


the Replacement axiom schema:

If F(x,y) is a formula such that for any x, there is a unique y making F true, and X is a set, then

{F x : x in X}

is a set. In other words, if you do something to each element of a set, the result is a set. ===============

But I can't figure out how to use that clue.

I might be avoiding the use of \E for no good reason. Maybe that solves the problem.

I'm trying to absorb a paper about Forest autonoma by transcribing it into larch. I made it through the basic definitions, but I'm having trouble with some of the more complex ones. I've attached my work so far. Any clues?

Also... is there any hope of running the larch prover on a linux box?

Dan Connolly

--------------1DB9750A2299 Content-Type: text/plain; charset=us-ascii; name="forest.lsl" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="forest.lsl"

forest(E, C): trait % % Transcribed by Dan Connolly <> from: % Forest-regular Languages and Tree-regular Languages % MURATA Makoto % May 26, 1995 % % % $Id$

includes Integer, List(Int, IntList), Set(IntList, IntListSet)


empty: -> C __{__} : E, C -> C __ || __ : C, C -> C

tree: C -> Bool

width : C -> Int

Dom: C -> IntListSet Domaux: C -> IntListSet Domaux2: C, C -> IntListSet

forest: C, IntList -> E

__ / __ : C, IntList -> C

asserts C generated by empty, __{__}, || % 2.1 Forest

forall a: E, u,v,w: C, i, w1, u1: Int, v1k, w2k, u1k, u2k, d: IntList

% 2.2 tree tree(a{u}) == true; % 2.2. Tree tree(empty) == false; tree(u||v) == false;

% 2.3 forest width width(empty) == 0; width(a{u}) == 1; width(u||v) == width(u) + width(v);

% 2.4 forest domain Dom(empty) = {};

Dom(a{v}) == {{1}} \U Domaux(a{v}); (1 \precat v1k) \in Domaux(a{v}) == v1k \in Dom(v); % @@ and nothing else is in Domaux(a{v})

Dom(v || w) == Dom(v) \U Domaux2(v,w); ((w1 + width(v)) \precat w2k) \in Domaux2(v,w) == w2k \in Dom(w); % @@ and nothing else

% 2.6 forest function forest(a{v}, {1}) == a; forest(a{v}, 1 \precat v1k) == forest(v, v1k); u1k \in Dom(v) => forest(v || w, u1k) = forest(v, u1k); (\not ((u1 \precat u2k) \in Dom(v))) => forest(v || w, u1 \precat u2k) = forest(w, (u1 + width(v)) \precat u2k)

% 2.8 subtree % Dom(u/d) == { 1 \precat v1k --------------1DB9750A2299--