[FoRK] binary XML

Stephen D. Williams sdw at lig.net
Wed Jan 19 06:57:19 PST 2005


Binary XML is needed by a very wide variety of people.  This includes 
those that need more speed, space, and processing efficiency plus extra 
semantics.  It especially includes whole industries and applications 
that wouldn't think of using XML 1.x today.  That potential market 
dwarfs the current market for XML 1.x.

It IS possible to have a binary format that is fast, standard, smaller 
(but not smallest), and has new useful semantics.  My favorite semantics 
are "sticky virtual pointers", delta capability, random access, direct 
modification, and self-contained subtrees.

I've been working on this, off and on, for a while.  See: http://esxml.org

I'm also the "invited expert" on the W3C XML Binary Characterization 
Working Group.  We're working on requirements that consist of: use 
cases, properties derived from use cases, measurement methodology (I'm 
the editor), and a characterization document where we make 
recommendations and determine what we think is in/out and feasible.  I 
wrote most of the "Business and Knowledge Processing" use case, about 
1/3 of the properties, and I'm working on a "Grid and Supercomputing" 
use case.

People already have several technologies that can be called "binary 
XML".  We're trying to define what a unified approach would need to 
cover and determine whether is might be possible to get agreement.  A 
later working group may work on actually selecting something, possibly 
after a period of experimentation.

I have the most radical vision of the group and I've introduced a number 
of ideas that, seemingly, many had not thought about.  While I have my 
own solution, I'm one of the more independent since my project is open 
source and not an existing commercial product.

My vision is an extension of ideas like zero-copy, collection 
intermediation, XML semantics, and versioning.  I need to be able to 
create an object that is a delta of another large object, be able to 
traverse and read/write that object, maintain any number of internal 
'pointers', and be able to serialize/deserialize with no work.  In other 
words, the wire format and memory format need to be able to be the same 
so that message traffic involves reading blocks of memory, direct access 
and modification, and writing blocks of memory.  While an application is 
free to use more traditional processing, a binary XML format should 
support this mode for transactional distributed processing.

Using my methods, in most cases hardware acceleration isn't needed, 
except at the boundaries with legacy systems.

sdw

Eugen Leitl wrote:

>On Wed, Jan 19, 2005 at 08:54:48AM -0500, Mark Day wrote:
>  
>
>>I'm not sure I understand who is really supposed to benefit from a binary
>>XML standard.  The discussion appears to be taking place in a zone where
>>people are just looking at XML and not the larger context of how it's
>>used/moved.
>>    
>>
>
>I could imagine that people who balk at the overhead of copying a buffer
>from an incoming packet would like a binary format that doesn't require 
>added latency due to compression (if you're worried about microseconds, 
>mentioning gzip appears ludicrious), and saves memory bandwidth by being a 
>tight representation.
> 
>  
>
>>If the goal is to reduce bits used in storing XML at a particular device,
>>that can be accomplished by either having the device do the compression
>>itself or by using a distilling proxy (Fox & Brewer,
>>    
>>
>
>Compression hardware is not free, both in terms of gates and latency added.
>I certainly can't add it to some existing, heterogenous hardware, especially
>if I'm used to free software coming via network.
>
>  
>
>>http://citeseer.ist.psu.edu/fox96reducing.html).  No need for anyone else to
>>know/care about the compressed representation.  Since it's unlikely that
>>    
>>
>
>The CPU cares. 
>
>  
>
>>people can design a single format that maximizes both space-efficiency and
>>time-effiency (of parsing & processing), whatever standard was decided, some
>>people might still need to do their own local representation. 
>>    
>>
>
>Now this is the interesting part. Can you make it binary, standard, open and
>portable?
> 
>  
>
>>If the goal is to reduce bits on the network, paired
>>application-accelerating appliances like a Riverbed Steelhead setup
>>(www.riverbed.com) are going to do a better job than a special file format,
>>especially because they'll be able to do the optimization across multiple
>>messages and more traffic types than just XML. [Full disclosure: I work at
>>Riverbed.]
>>    
>>
>
>Nice, but I can't ftp hardware (even if it was free hardware).
> 
>  
>
>>If the goal is to reduce bits on the network while using current web
>>browsers (i.e. a "single-ended" solution), application/gzip is already
>>    
>>
>
>If I'm throwing XML over LAN, I can readily see the advantage of using a
>binary format on slow (100 MBit Ethernet, 400 MHz SPARC) machines with tight
>(less than a GByte), expensive ($$$) memory. Using gzip will actually
>exacerbate the problem in this case.
>
>  
>
>>there.  I'm not sure whether the wrapping of a text/xml in there works out,
>>but getting that right still doesn't require a "binary XML" as much as it
>>requires tweaking some MIME-related specs.
>>    
>>
>
>  
>
>------------------------------------------------------------------------
>
>_______________________________________________
>FoRK mailing list
>http://xent.com/mailman/listinfo/fork
>  
>



More information about the FoRK mailing list