[FoRK] Now with magic pixie dust!
Stephen D. Williams
sdw at lig.net
Fri May 21 20:18:13 PDT 2004
The argument that you must _always_ validate a particular data structure
completely in _all_ circumstances is obviously false.
You may mean that in the general purpose external/internal boundary of a
service application where clients can't be trusted and data corruption
may not have been detected by other means, validation is a necessity.
This is obivously true.
In the general case, I will point to instances where it is obviously
unacceptable to require that all programs be forced to fully validate
before use just because they may be based on a standard data format:
PDF documents, must you really fully validate a 100MB document before
displaying the first page?
Database, must Oracle fully validate it's structure before allowing the
Filesystems, must formate be fully validated before any mount completes?
Data structures passed from one method to another, one process to
another, etc.? Maybe data structures need to be validated between every
line of code? (Good for debugging sometimes, but obviously absurd as a
These situations are all among those that could occur in the use of a
flexible, scalable, efficient multi-optimized structure that avoids both
parsing and serialization but allows efficient modification.
I favor just-in-time validation. When accessing a structure, you
validate that everything makes sense as you touch parts of the data
structure and error appropriately without failing in all circumstances.
The only real negative is that you may complete some work before you
realize that you are screwed.
This enables one group of desired outcomes that I am after: you should
be able to load such a data structure with simple raw block loads,
access/modify as many or as few fields as required with linear cost
starting at no cost, and be able to write out the data structure with
raw writes and optional condensation/optimization.
XML 1.0 with parsing/validation requires all parsing, validation, memory
allocation, data structure construction, memory cleanup, and
serialization work even if only 1 field out of 3000 are accessed or
modified. XML 1.0 with current best practices best case is close to
esXML worst case. How close, we shall see soon, but the concept is sound.
Meltsner, Kenneth wrote:
> Well, Elliotte Harold makes a good point -- that you can't always
> depend on valid data, since other programs may go bad, there may be an
> attack, etc. Validation is a part of the price we pay for reliability.
> The XPath handled by Tarari is a bit limited -- the document sort of
> mentions that it may look for XPath true/false values if you want the
> best performance. Nodes and strings may be slower.
> It would be most interesting to compare Tarari against one of the
> high-performance XPath filters, like YFilter from Berkeley, XMLTK
> formerly from U Washington, or elsewhere. Comparing it against Xalan
> is sort of a strawman unless it's a drop-in replacement for a JAXP
> parser; if there's special coding required, you may as well compare
> with one of the other special libraries out there.
>FoRK mailing list
swilliams at hpti.com http://www.hpti.com Per: sdw at lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw
More information about the FoRK