[FoRK] [cml/ccml-discuss] Getting involved with CML (fwd from pm286@cam.ac.uk)

Eugen Leitl eugen at leitl.org
Sat Feb 26 08:32:49 PST 2005

----- Forwarded message from Peter Murray-Rust <pm286 at cam.ac.uk> -----

From: Peter Murray-Rust <pm286 at cam.ac.uk>
Date: Sat, 26 Feb 2005 12:28:11 +0000
To: <cml-discuss at lists.sourceforge.net>
Subject: [cml/ccml-discuss] Getting involved with CML
X-Mailer: QUALCOMM Windows Eudora Version 5.1.1

Until recently the public discussion on CML has been fairly low and 
contributions have come from a core of 5-10 groups or individuals, many of 
whom have met IRL. With the increasing acceptance and deployment of CML, 
the release of JUMBO4.6 (the first version whose architecture I have 
confidence in :-) and a number of projects which have adopted CML, we want 
to widen the discussion and involvement of the community. This mail sets 
out some history, suggests ground rules, thanks current contributors and 
invites and welcomes new ones.

In my own mind, CML started in about 1980 when I was writing software to 
analyze crystal structures in the Cambridge database.  With Sam Motherwell 
and Jim Raftery we created a system (GEOSTAT) which read numbers of 
database entries, carried out 2-and 3-D substructure search, correlated 
fragments and mapped them, calculated geometrical parameters, carried out 
statics and rendered the results. There was no formal data structure or 
externalisation and it is clear that these were major impediments. As an 
example the system would only read data in then database format and would 
share it through COMMON blocks. This meant there was no way that it could 
be used to analyse (say) the structure of ligands in PDB. It is an imminent 
hope that, 25 years after, we can represent the same functionality in a CML 
system! We're nearly there...

1980s technology could not support proper software development and we saw 
the explosion of "file formats" which currently bedevil us - confusing 
storage, data structures, formalisation of semantics and generally locking 
chemistry into a visual rather than semantic subject. However other 
disciplines began to adopt newer ways and I have been particularly 
influenced by: libraries (e.g. NAG), packages (e.g. SPSS), workflows (AVS) 
and of course OO-programming.

Henry and I met in the early 1990's - I can't remember exactly how we made 
contact but he visited Glaxo (where I then was) - probably in ca 1992. Over 
that period we have evolved a symbiotic relationship and meet frequently 
IRL. I tend to take the initial lead in code development and Henry in areas 
of web deployment but everything ends up as a joint work, especially the 
concept "CML". Most of the JUMBO system is written by me, at least in first 
instance, but increasing numbers of people (listed in the distrib) are 
starting to contribute.

In our experience most Open projects benefit from a benign dictatorship, 
based largely on Eric Raymond's principles - contribution is fundamental. 
CML is an architecture and architectures are difficult to create. They 
usually require continual refactoring and that is why JUMBO4.6 is called a 
"major release". The architecture also embodies a vision and this has 
changed over the years from a toolkit to visualiser and back to a semantic 
toolkit. Among the forces driving this have been the availability of 
technology. For JUMBO1 I had to create a complete DOM, tree renderer and 
editor, molecular visualiser, SGML Parser, chemical perception engine and 
more, all in AWT 1.02 (i.e with little library help). With the availability 
of quality tools (both generic and molecular) JUMBO has now converged to a 
toolkit. Because of the contributions from the Open community and the slow 
but increasing availability of CML-aware commercial tools many components 
can be removed from JUMBO. It may sound Zen-like but I feel extremely happy 
when I am able to delete a functionality from Jumbo.

Therefore the core of CML is currently Henry and me - not a committee, not 
a voting system and not a standards body. Maybe at some stage CML will 
become a formal standard somewhere (this has been requested) but it would 
be inappropriate at this stage. Standards are to enforce conformance or to 
resolve questions in law courts. However we do take conformance very 
seriously and are currently devising a de facto approach towards this. 
While it will not be an ISO/OASIS/IUPAC standard, it will be available to 
support the chemical community. For example we work very closely with IUPAC 
on XML.

CML is also not universal. It is not intended to be the only way of doing 
chemistry in XML. For example the analytical community has developed AniML 
which represents a more formal and complete way of representing data and 
spectra. We have frequent interaction with AniML and design CML so that it 
will provide "hooks" into it where required. The same is true of ThermoML.

CML is mainly driven by existing practice rather than trying to change the 
conceptualisation of chemistry. In a few cases (e.g. CMLReact/CMLSnap) we 
think there is an opportunity for representing chemistry in a slightly 
different way but in general there are few surprises, just formalisation. 
However chemistry has so much implicit semantics that this formalisation is 
very challenging.

In general, therefore, CML is driven by current examples. We have developed 
CMLReact by asking "can we represent these reactions in CML?" We'd be very 
grateful for contributions of examples - these might well stress the 
system. We are doing the same with CMLComp.

To summarise this and other mails we invite collaborations and 
contributions - all are formally credited. Suggested areas are
        * bugfixing
        * examples
        * documentation and tutorials
        * wrappers
        * conversion kits to/from other XML and legacy systems
        * databases
        * interfaces and adapters

and perhaps in conjunction with other projects
        * editing tools
        * renderers

We are happy to work with anyone (Open, non-Open, non-commercial, 
commercial, etc.). The contributions of the Open community are publicly 
visible on many lists. We particularly want to acknowledge the support over 
several years of Dan Zaharevitz at the National Cancer Institute for parts 
of JUMBO. We are pleased to see commercial products which include CML but 
make it clear that we have not been involved in these and offer no 
assertion that they are conformant. (If a product makes inappropriate 
claims or representations we shall contact its developers) We do not 
normally approach commercial companies about incorporating CML but are 
happy to be approached by them if they want to explore CML.


Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069

SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
cml-discuss mailing list
cml-discuss at lists.sourceforge.net

----- End forwarded message -----
Eugen* Leitl <a href="http://leitl.org">leitl</a>
ICBM: 48.07078, 11.61144            http://www.leitl.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
http://moleculardevices.org         http://nanomachines.net

More information about the FoRK mailing list