Comment 5 for bug 1024435

Revision history for this message
Tobias Hoffmann (smilingthax) wrote :

I'm the one currently working on the new libqpdf-based pdf-implementation.

The old implementation of pdftopdf does not duplicate that much of poppler, but it has a class structure that /parallels/ the poppler document representation, because poppler was never built to /output/ pdfs, but pdftopdf has to! This is done by all the P2P*-classes in pdftopdf: They can actually recreate the pdf-objects that poppler parsed.
So there is quite tight coupling of pdftopdf to the poppler internals, but there are much less eyes looking at pdftopdf than are looking at the poppler-code.
This make maintainance of pdftopdf tedious across changes between poppler versions.
IMO this current state does not benefit security at all.

qpdf is extensively tested, and built for the job: It provides an c++-object-oriented interface to the pdf-objects, but does not interpret their contents, except for the very basic structure (root catalog, pages tree), and neither does the new pdftodf -- because it does not have to. Basically all the transformations can be done without actually looking at the exact values from the input pdf; it's either pass-it-through or throw-it-away. That means: The actual code surface actually "exposed" to potentially dangerous inputs is quite small.