[FoRK] Now with magic pixie dust!
Stephen D. Williams
sdw at lig.net
Sun May 23 12:23:27 PDT 2004
Infosets vs. SAX + custom data structures vs. DOM:
Every application works with a data structure that is the embodiement of
some abstract infoset.
Generally speaking, SAX a application uses events to construct a
proprietary, custom data structure. You can't compare SAX by itself to
anything that builds or maintains a data structure. You have to compare
alternatives with SAX coupled either with a general purpose data
structuer or with a custom data structure.
A DOM application is using a data structure is that general purpose for
infosets compatible with XML expression in representing those same events.
Collection classes, data structures, tree+graph data structure completeness:
With the collection classes of C++ (STL), Java, perl, etc., no one
should be building raw data structures very often.
Most abstract infosets can be expressed with a small number of
constructs: trees (which includes arrays), graphs, and queues. With
small additional semantics, a tree/array system can also represent
graphs. Graphs and arrays can represent queues, linked lists, etc.
A collection class that supports all of these with indexing/hashing and
support of any type of value/payload, including binary blocks, would
suffice for many programming tasks, including nearly all business
Minimum overhead of "line/file format" vs. "conversion + operational
memory data structure + modification overhead":
Applications that read mostly XML (or similar general format) data and
write XML data in an SOA, n-tier environment often have a minimum
overhead computational load that dwarfs the actual processing they are
accomplishing. Certainly with many distributed computing methods, such
as CORBA, DCOM, and systems based on ASN.1 where you working with IDL or
IDL-like systems, the maintenance overhead and tightness of binding lead
to serious long-term issues.
Theory of minimum distributed-application processing and data overhead:
Ideally, the overhead of getting data into and out of an application and
into and out of data structures that functional code can actually do
something with should be minimized to something approaching theoretical
minimum. My theory is that you can have both the general, standardized
data expression of XML and avoid nearly all overhead except raw I/O of
blocks and a slight overhead of traversal/access/modification. This
overhead should be linear to the number of operations performed, not to
the number or type of elements in a block of data.
Serialized, wire/file-formats of data have been optimized mostly
independently of memory data structures.
Memory-based data structures have been optimized mostly independently of
serialized data formats. Mostly they have been the concern of language
and library designers with respect to in-memory processing. Pascal, the
original Wirth Pascal, didn't even have input/output operators: all
input/output in actual Pascal languages was non-standard.
My observation is that as applications and application components become
more and more distributed, componetized, and bound in ways that force
frequent transitions between serialized form and operational memory
form, the overhead of existing methods will continue to increase sharply
and become less tolerable. After working on it for a while, I am
convinced that it is possible to solve this problem in a way that will
create a new paradigm at the 3GL and below levels of the stack while
supporting a variety of existing and alternate models above. In
particular, data and data structures that are read in, operated on, and
written out should not expressed in 3GL method variables but in a format
like esXML and accessed via a collections-style interface like esDOM.
Back in 1998, which is when I started thinking about this problem quite
a bit, FoRK had a discussion about YML which explored substantially the
Gavin Thomas Nicol wrote:
>On Friday 21 May 2004 10:35 pm, Stephen D. Williams wrote:
>>I think that an XPath based API is pretty general, with certain
>>semantics. You need to be able to get, set (create/replace), append,
>>insert. You need array indexing, array counting, iteration/enumeration,
>>subtree operations (get, set, append, insert subtrees).
>These are not necessary for a significant number of applications... for
>example, rendering a page of data in a read-only scenario, or sucking in a
>SOAP message doesn't really need much more than a stack and some SAX events
>(bit of an oversimplification, but...). XPath as such is likewise overkill
>(and overhead!) for many applications.
>In many cases, these are also not only not necessary, but irrelevant. Go one
>or two levels higher in the application, and the XML can't (or shouldn't be)
>FoRK mailing list
swilliams at hpti.com http://www.hpti.com Per: sdw at lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the FoRK