[FoRK] Now with magic pixie dust!

Stephen D. Williams sdw at lig.net
Fri May 21 20:18:13 PDT 2004


The argument that you must _always_ validate a particular data structure 
completely in _all_ circumstances is obviously false.
You may mean that in the general purpose external/internal boundary of a 
service application where clients can't be trusted and data corruption 
may not have been detected by other means, validation is a necessity.  
This is obivously true.

In the general case, I will point to instances where it is obviously 
unacceptable to require that all programs be forced to fully validate 
before use just because they may be based on a standard data format:

PDF documents, must you really fully validate a 100MB document before 
displaying the first page?

Database, must Oracle fully validate it's structure before allowing the 
first transaction?

Filesystems, must formate be fully validated before any mount completes?

Data structures passed from one method to another, one process to 
another, etc.?  Maybe data structures need to be validated between every 
line of code?  (Good for debugging sometimes, but obviously absurd as a 
blanket requirement.)

Etc.

These situations are all among those that could occur in the use of a 
flexible, scalable, efficient multi-optimized structure that avoids both 
parsing and serialization but allows efficient modification.

I favor just-in-time validation.  When accessing a structure, you 
validate that everything makes sense as you touch parts of the data 
structure and error appropriately without failing in all circumstances.  
The only real negative is that you may complete some work before you 
realize that you are screwed.

This enables one group of desired outcomes that I am after: you should 
be able to load such a data structure with simple raw block loads, 
access/modify as many or as few fields as required with linear cost 
starting at no cost, and be able to write out the data structure with 
raw writes and optional condensation/optimization.

XML 1.0 with parsing/validation requires all parsing, validation, memory 
allocation, data structure construction, memory cleanup, and 
serialization work even if only 1 field out of 3000 are accessed or 
modified.  XML 1.0 with current best practices best case is close to 
esXML worst case.  How close, we shall see soon, but the concept is sound.

sdw

Meltsner, Kenneth wrote:

> Well, Elliotte Harold makes a good point -- that you can't always 
> depend on valid data, since other programs may go bad, there may be an 
> attack, etc.  Validation is a part of the price we pay for reliability.
>
> The XPath handled by Tarari is a bit limited -- the document sort of 
> mentions that it may look for XPath true/false values if you want the 
> best performance.  Nodes and strings may be slower.
>
> It would be most interesting to compare Tarari against one of the 
> high-performance XPath filters, like YFilter from Berkeley, XMLTK 
> formerly from U Washington, or elsewhere.  Comparing it against Xalan 
> is sort of a strawman unless it's a drop-in replacement for a JAXP 
> parser; if there's special coding required, you may as well compare 
> with one of the other special libraries out there.
>
> Ken
>
>------------------------------------------------------------------------
>
>_______________________________________________
>FoRK mailing list
>http://xent.com/mailman/listinfo/fork
>  
>


-- 
swilliams at hpti.com http://www.hpti.com Per: sdw at lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw



More information about the FoRK mailing list