[FoRK] Now with magic pixie dust!

Meltsner, Kenneth Kenneth.Meltsner at ca.com
Sun May 23 15:34:24 PDT 2004


PDFs when linearized properly can be byteserved over the Web; sometimes this is called "web optimization."  Without linearization, the whole PDF needs to be downloaded first.

I'm not sure whether Adobe reads the whole file if the PDF is local and linearized for validation -- I may try some experiments.

Ken



-----Original Message-----
From:	Stephen D. Williams [mailto:sdw at lig.net]
Sent:	Sun 5/23/2004 2:26 PM
To:	Meltsner, Kenneth
Cc:	fork at xent.com
Subject:	Re: [FoRK] Now with magic pixie dust!
You are wrong about PDFs.  Install a modern Adobe Acrobat reader and 
browse proceedings from the RSA conference for an example of their 
extreme download-on-demand mode.  (Actually, it is so aggressive with 
this that it was highly frustrating.)

It's not just how critical a system is, it is about how practical you 
need to be.  If a system needs to perform, or you need to perform, at a 
certain level, there is only so much useless effort you can expend.

sdw

Meltsner, Kenneth wrote:

>
> Some people program in a cautious/paranoid fashion -- some people run 
> fsck even after a clean shutdown.  I don't know when it stops being 
> caution and starts to be paranoia; a lot depends on how critical the 
> system is, I suppose.
>
> And I think that PDFs are validated in their entirety before being 
> displayed; ghostscript/gsview are happy to give their best efforts.
>
> Ken
>
>
> -----Original Message-----
> From:   Stephen D. Williams [mailto:sdw at lig.net]
> Sent:   Fri 5/21/2004 10:18 PM
> To:     Meltsner, Kenneth
> Cc:     fork at xent.com
> Subject:        Re: [FoRK] Now with magic pixie dust!
> The argument that you must _always_ validate a particular data structure
> completely in _all_ circumstances is obviously false.
> You may mean that in the general purpose external/internal boundary of a
> service application where clients can't be trusted and data corruption
> may not have been detected by other means, validation is a necessity. 
> This is obivously true.
>
> In the general case, I will point to instances where it is obviously
> unacceptable to require that all programs be forced to fully validate
> before use just because they may be based on a standard data format:
>
> PDF documents, must you really fully validate a 100MB document before
> displaying the first page?
>
> Database, must Oracle fully validate it's structure before allowing the
> first transaction?
>
> Filesystems, must formate be fully validated before any mount completes?
>
> Data structures passed from one method to another, one process to
> another, etc.?  Maybe data structures need to be validated between every
> line of code?  (Good for debugging sometimes, but obviously absurd as a
> blanket requirement.)
>
> Etc.
>
> These situations are all among those that could occur in the use of a
> flexible, scalable, efficient multi-optimized structure that avoids both
> parsing and serialization but allows efficient modification.
>
> I favor just-in-time validation.  When accessing a structure, you
> validate that everything makes sense as you touch parts of the data
> structure and error appropriately without failing in all circumstances. 
> The only real negative is that you may complete some work before you
> realize that you are screwed.
>
> This enables one group of desired outcomes that I am after: you should
> be able to load such a data structure with simple raw block loads,
> access/modify as many or as few fields as required with linear cost
> starting at no cost, and be able to write out the data structure with
> raw writes and optional condensation/optimization.
>
> XML 1.0 with parsing/validation requires all parsing, validation, memory
> allocation, data structure construction, memory cleanup, and
> serialization work even if only 1 field out of 3000 are accessed or
> modified.  XML 1.0 with current best practices best case is close to
> esXML worst case.  How close, we shall see soon, but the concept is sound.
>
> sdw
>
> Meltsner, Kenneth wrote:
>
> > Well, Elliotte Harold makes a good point -- that you can't always
> > depend on valid data, since other programs may go bad, there may be an
> > attack, etc.  Validation is a part of the price we pay for reliability.
> >
> > The XPath handled by Tarari is a bit limited -- the document sort of
> > mentions that it may look for XPath true/false values if you want the
> > best performance.  Nodes and strings may be slower.
> >
> > It would be most interesting to compare Tarari against one of the
> > high-performance XPath filters, like YFilter from Berkeley, XMLTK
> > formerly from U Washington, or elsewhere.  Comparing it against Xalan
> > is sort of a strawman unless it's a drop-in replacement for a JAXP
> > parser; if there's special coding required, you may as well compare
> > with one of the other special libraries out there.
> >
> > Ken
> >
> >------------------------------------------------------------------------
> >
> >_______________________________________________
> >FoRK mailing list
> >http://xent.com/mailman/listinfo/fork
> > 
> >
>
>
> --
> swilliams at hpti.com http://www.hpti.com Per: sdw at lig.net http://sdw.st
> Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw
>
>
>
>


-- 
swilliams at hpti.com http://www.hpti.com Per: sdw at lig.net http://sdw.st
Stephen D. Williams 703-724-0118W 703-995-0407Fax 20147-4622 AIM: sdw





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lair.xent.com/pipermail/fork/attachments/20040523/3372e9a8/attachment.html


More information about the FoRK mailing list