[FoRK] binary XML

Eugen Leitl eugen at leitl.org
Wed Jan 19 02:51:59 PST 2005


http://uk.builder.com/architecture/web/0,39026570,39233479,00.htm

How do we make XML faster?
Martin LaMonica, 14 January 2005
CNET News.com

XML's verbosity and lack of inherent compression are causing speed problems
all round for those trying to implement Web services, but there's no
agreement on how to improve things

The technology known as XML has become a nearly universal way to share
information online. But there's a growing recognition that XML's benefits
sometimes come with a price tag: sluggish performance.

That problem is now spawning efforts to speed up XML traffic. Proponents say
a skinnier XML will boost the speed of everything from Internet commerce to
data exchange between mobile phones. But so far, there's no agreement on the
technology to make that happen.

Here's the problem: right now, the XML standard calls for information to be
stored as text. That means that an XML document, such as a purchase order or
a Web page, can be easily viewed by a person or "read" by a machine, either
through widely available text editors or XML parsers.

But performance problems result from XML's tendency to create very large
files. That's in part because XML formatting calls for each element within a
document to be tagged with labels written out as text. What's more, XML-based
protocols, called Web services, also generate a great deal of XML traffic.

"Not only is XML verbose, but it's extremely wasteful in how much space it
needs to use for the amount of true data that it is sending," said Jeff Lamb,
chief technology officer of Leader Technologies, which uses XML extensively
in teleconferencing applications and believes that a change is needed.

The leading candidate to help alleviate XML's performance woes is a
technology called binary XML, which calls for a new format that compresses
XML transmissions.

Sun has started an open source Fast Infoset Project based on binary XML, and
the standards body responsible for XML, the W3C, has formed the Binary
Characterization Working Group to consider putting XML in binary format.

On the face of it, compressing XML documents by using a different file format
may seem like a reasonable way to address sluggish performance. But the very
idea has many people -- including an XML pioneer within Sun -- worried that
incompatible versions of XML will result.

"If I were world dictator, I'd put a kibosh on binary XML, and I'm quite
confident that the people who are pushing for it would find another
solution," said Tim Bray, who's both co-inventor of XML and an executive in
Sun's software group.

"But as it is, these people think they're right and they're not stupid, so
maybe they are right. Thus, let's hope that they play nice with standards
bodies and provide that free open-source software -- all of which the Sun
Fast-Infoset people are doing, to their credit," Bray said.

Putting the squeeze on XML
The Fast Infoset plan, which represents more than a year of work, proposes
that XML documents get shrunk down into a binary format in order to speed up
transmission of files over the Internet. Sun has chosen a compression method
that's already a standard used in the telecommunications industry.

The Sun engineers behind Fast Infoset argue that binary encoding is necessary
because it can greatly improve performance, which is necessary in certain
situations.


In initial tests, they found that applications perform two or three times
faster when using the software. The goal of the Fast Infoset project is to
generate interest among developers and eventually create a standardized
binary format.

Manufacturers of consumer devices such as Canon, as well as mobile-phone
companies such as Nokia, have argued for a binary XML format. Without it,
large files such as images will take too long to download to devices such as
mobile phones, they argue.

The primary concern is interoperability. Potentially, several different
binary formats for specific purposes could emerge, which are not universally
understood. For example, there may be a method for encoding images sent to
consumer electronics, which may differ substantially from others.

Bray is sceptical of the entire notion of converting XML to any format other
than text.

"The fact that XML is ordinary plain text that you can pull into Notepad...
has turned out to be a boon, in practice," he said. "Any time you depart from
that straight-and-narrow path, you risk loss of interoperability. Experience
with interoperability via XML as it is, has been excellent. Why take
chances?"

Bray noted that there are methods for speeding up XML traffic other than
creating a binary format. Advances in networking and processing power go a
long way in addressing performance concerns, though perhaps not on
battery-constrained mobile phones, he said.

Janet Perna, the general manager of IBM's information management group, said
one alternative to binary XML is to handle the mushrooming in XML traffic
with faster networking. Five or six years ago, people thought that the
Internet would be too slow for doing online commerce, but the industry
eventually overcame those barriers, she said.

"I don't see [growing XML traffic] as a limitation here. I think we'll keep
up with it," she said.

ZapThink, a research firm specializing in XML and Web services, echoed
concerns over binary XML, notably the possibility of proprietary
implementations. ZapThink analysts also noted that an XML message can touch
several different pieces of software and hardware, such as security systems,
all of which would support any binary XML standard.

ZapThink's Ron Schmelzer said binary XML may be limited to niche uses such as
high-volume applications, which demand the best performance.

Leader Technologies' Lamb supports the idea of binary XML but with one
important caveat -- that it be standardised.

"The amount of transactions that contain XML continues to exponentially
expand, so we don't want to get caught behind the problem," he said. "But if
we can't achieve a standard [with binary XML], then my support would go way
down."

See also http://www.oreillynet.com/pub/wlg/6021

LISP is better than XML, but worse is better
Rick Jelliffe
	

Rick Jelliffe
RSS 1.0 feed for Rick Jelliffe. Atom feed for Rick Jelliffe.
Dec. 05, 2004 06:28 PM
Permalink
	
   	Print.	Print
Email.	Email weblog link
Discuss.	Discuss
Trackbacks.	Trackbacks
Blog this.	Blog this
At the dawn of XML, some LISP fans would say that XML was just a crappy LISP.
(The clueier LISP fans would use "s-expr" or SEXPR or S-expressions, as the
Lots of Irritating Silly Parentheses syntax is known.) But Java plus XML plus
the Beanshell interpreter is a pretty nice crappy LISP!

The markup-language-as-bad-sexpr notion predates XML by almost a decade with
SGML: indeed, with SGML the comparison is fairer, because SGML does include
features for setting delimiters and constructing little languges.SGML's
SHORTREF and ENTITY mechanism can be compared to macros in LISP, for example.

One reason XML was designed with the principle "Terseness is of minimal
importance" was to cut SHORTREFs out. (SGML is still in use by people who
need SHORTREFs. But vendors who cannot make a buck out of SGML won't tell you
that :-)

Syntax aside, LISPers point out that the XML infoset (i.e., the general data
structure that applications may see when the text is parsed) is an attribute
value tree (AVT), just like modern LISP lists. (AVTs are very convenient to
have available. Certainly one of the reasons for XML's success is that it has
allowed vendors to add fairly similar AVT APIs to their libraries.) However,
LISP has syntactic features to allow the recognition of numbers and symbols
in data: XML just has strings. (Both can represent links between nodes, so
really the data structure is an AVT with cross links, like a directed, rooted
graph.)

Paul Prescod has a nice page XML is not S-Expressions on the topic. I would
also add that XML's encoding declaration is the only text format that
provides a workable (though, of course, fallible) approach to the problem of
world-wide variations in text encoding: LISP and probably every other
programming language does not even get to first base.

S-Expressions have no standard equivalent of DTDs, for validation. XML DTDs
provide a basic unit test for documents, which promotes quality testing,
clearer interface definition, a separation of concerns between information
providers and information recipients, and that the WWW as a data flow model.

So XML's encoding basis is superior to LISP. Its flexibility for creating
little languages is less than LISP. Their data structures are pretty much the
same. LISP has marginally richer datatypes. Each have different software
engineering qualitites. Parenthesis syntax is familiar to programmers; on the
other hand, angle-bracket syntax is familiar to web coders.

So XML versus S-Expr is a draw, to me. When character set encoding and markup
are important, XML wins. When terseness or recognizing numbers are important,
S-Expressions win.

What about XML+Java versus LISP? That is a bit fairer.

I am very affectionate towards LISP. In the early 90s, I briefly worked for
Texas Instruments supporting their Explorer LISP systems: wonderful things.
TI were closing the Explorer project down at that stage: the belief was that
LISP (the language) would not be needed because LISP (the bundle of features)
would win. The TI boffins said that in the future (i.e. now) when you opened
up a language platform, you would see standard list/AVT structures, garbage
collection, object oriented-ness, message passing, dynamic linking,
expression parsing, and a whole slew of other features LISPers loved and
which were not available in, say, the C APIs. They were right.

But the most characteristic thing of LISP is the eval function. Can I have
that in Java+XML? I have been using the BeanShell interpreter for this, to
provide interpreted scripts in my company's products. With Beanshell "Users
may now freely mix loose, unstructured BeanShell scripts, method closures,
and full scripted classes." (At Topologi, we debug using Eclipse and compiled
versions, then strip out some header info to generate the scripts when
deploying.) I certainly don't want to claim that XML+Java+Beanshell is as
beautiful as LISP, but they go a long way towards having the equivalent power
of LISP, indeed of other interpreted languages.

LISP had another strong influence on XML, because of Richard Gabriel's paper,
usually called Worse is Better, which should be required reading for anyone
who is a big fan evangelizing any language, be it Python, C#, XQuery or ASP.
(For more, including "Better is Worse" see Gabriel's site. Sun's Jim Waldo
has a recent response Worse is still worse which I think misses Gabriel's
fundamental point: Waldo paraphrases Gabriel as "Better depends on your
quality metric", while I believe Gabriel's paper is the much more challenging
"our quality metric can be wrong".)

Rick Jelliffe is CTO of Topologi, and a standards activist with ISO and W3C
involved in XML, WWW internationalization, and schema languages.
Comment on this weblog
You must be logged in to the O'Reilly Network to post a comment.
Trackbacks appear below the discussion thread.
Post Comment
Full Threads Oldest First

Showing messages 1 through 3 of 3.

    * surprised
      2004-12-09 21:18:50  akhu [Reply | View]

      I am surprised that you have not considered XML and XSLT. I feel that
this is a lot like LISP, better in some ways (ex: standards, web), at least
it is functional programming working on graphs. Did I miss something ?


      Cheers,
      ac
          o surprised
            2004-12-09 22:02:20  rjelliffe [Reply | View]

            Indeed, the LISP link is direct: XSL is a reworking of ISO DSSSL
(Document Semantics and Style Specification Language pronounced like
'thistle') which used a subset of Scheme, a functional LISP language. (The
editor of ISO DSSSL, W3C XPath and W3C XSLT was James Clark, technical lead
for the XML Working Group and later editor of the ISO RELAX NG schema
language. He also wrote the well-known open source programs groff, sgmls, xp,
st, sgmls, and jing.)


            Apparantly DSSSL was originally going to have a custom syntax,
but they were having trouble making a nice one. (This was more than a decade
ago.) Interleaf were using LISP with success; I wrote to comp.text.sgml to
report that in Japan I had co-written RISP, a LISP subset for processing an
SGML subset, and that it worked well; plus there were other LISP advocates
around. James Clark had previously implement the troff typesetting language,
which is as far from elegant as possible, so I think he was keen to adopt a
functional/declarative approach. I wasn't involved in ISO at that stage, so I
don't know the exact details.


            Some purists (and I agree with them) say that the essence of LISP
is not functional programming, nor the list structure, nor garbage
collection, and so on, but the eval function. XSLT does not have the ability
to generate a script then run it (if it did, Schematron could be compiled in
one pass, for example!).


            But you make a good point: with XSLT, you do have your program
code available as a list/tree that can be accessed/created by other
processes, just like in LISP. But, unfortunately, not in the same executing
program. Java with the Beanshell does let you generate text and execute it as
code, but it does not give a neat tree for manipulating the code (e.g. for
symbolic computation.) So the closest thing for Java is to store the parse
tree in an XML tree that can be manipulated, then generate Beanshell scripts.
(The alternative to BeanShell, using reflexion, is just too horrible to
contemplate for maintenance reasons at least.)
                + surprised
                  2004-12-16 18:12:49  akhu [Reply | View]

                  You are right, eval (or evalute, sometimes) is not part of
the standard and this is quite unfortunate.


                  XSLT processors like Saxon do have an eval (and evaluate)
extension function to do the job.


                  It may not be a part of the standard, but still it is
available as part of the language, at least with the better processors.


                  Still, I agree with you, it should part of the standard
language and many interesting applications would not be feasible without it.
We do use it extensively.


                  Thank you.
                  ac. 


-- 
Eugen* Leitl <a href="http://leitl.org">leitl</a>
______________________________________________________________
ICBM: 48.07078, 11.61144            http://www.leitl.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
http://moleculardevices.org         http://nanomachines.net


More information about the FoRK mailing list