[FoRK] binary XML

Reza B'Far (Voice Genesis) reza at voicegenesis.com
Fri Jan 21 23:41:02 PST 2005


Cool!  Any pointers to papers on "edge transformation"?  Would like to learn
more about it.

We started with a dynmaic dictionary... (as you know, Huffman is a pretty
easy algorithm and there isn't much code to write to build a dynamic
dictionary based on it...)... that version still exists.

The funny thing is that, in 99% of practical applications, the domain model
doesn't change all that much (IMHO)... most of the time, it grows... there
are occasional changes, but not much... so, what we did a year later was to
go to a completely static encoding scheme... (of course this means you can
now have a static XML schema even for your Huffman encoded XML)... This of
course allows for lots more optimization  .Though you could argue why use
XML at all when if there are not many changes...which ends up in a somewhat
religious discussion...

This was a practical approach to start with a very dynamic schema... and
fine-tune organically as it settles down... still ending up with something
that's human readable... funny thing is, for whatever reason, people
remember letters better than 0's and 1's (probably the fact that there are
more letters than binary or hex digits)... so, developers eventually start
understanding the encoded XML schema and not even look at the transformed
version!  Developers walking around saying "send the f command to the server
with r set to true and n set to 5", etc.

This hack makes me think that binary XML is probably not the answer...
because it's an obvious answer... and Occom's Razor, IMHO, does not apply to
problems involving XML ;-)

R


-----Original Message-----
From: fork-bounces at xent.com [mailto:fork-bounces at xent.com]On Behalf Of
Gavin Thomas Nicol
Sent: Friday, January 21, 2005 9:55 PM
To: 'forkit!'
Subject: Re: [FoRK] binary XML



On Jan 22, 2005, at 12:12 AM, Reza B'Far (Voice Genesis) wrote:
> This all assumes you have pretty intelligent design engineers (not just
> people who know how to count bytes, but have the ability to understand
> statistics and things like Huffman encoding).  Obviously, there are
> more
> advanced domain-based compression techniques that could take in text
> and
> produce text.  And, if they are simple enough, you can use an XSL to
> view
> them during testing, development, etc.
>
> So, this is hacky, but it is a way to buy a little performance.

FWIW. This is similar to the approach that I call "edge
transformation", where
you emphasise the ability for the consumer to interpret the data
stream, rather
than standardising the data stream. It gives you more flexibility, and
as in
your case, allows true application-based optimisation.

I assume you are preloading the huffman dictionaries with the most
common
symbols?




_______________________________________________
FoRK mailing list
http://xent.com/mailman/listinfo/fork




More information about the FoRK mailing list