Ubuntu
qpdf package

[MIR] qpdf

Bug #1024435 reported by Till Kamppeter on 2012-07-13

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	qpdf (Ubuntu)	Fix Released	Undecided	Unassigned

Bug Description

The current cups-filters package uses a pdftopdf filter (filter to do page management operations like 4 pages per sheet, seleted pages, reverse order, ...) which currently is based on Poppler and due to lack of API functionality in Poppler it copies parts of Poppler's source code.

To avoid this we have decided on changing the architecture of the pdftopdf filter as a Google Summer of Code project. The filter will be based on the more lightweight PDF manipulation library QPDF instead of on the Poppler PDF interpreter. As cups-filters and with it pdftopdf are in Main, QPDF needs to get moved to Main.

QPDF is currently in Universe and gets built for all supported architectures.

There are no known security issues (CVEs), no SUID, sbin, no daemons, no use of privileged ports

Package is well-maintained upstream and at Debian, Ubuntu is syncing it. Upstream is accepting the extensions for use with pdftopdf. No open bugs. debian/watch file.

No configuration needed, no debconf, documentation by man pages, CLI, no GUI.

All dependencies are already in Main.

Test suit (with more than 1000 tests) included in the package and it passes when running "make check" after package build. Can be easily added to debian/rules.

Revision history for this message

Jay Berkenbilt (ejb) wrote on 2012-07-13:

For what it's worth, I'll mention also that "upstream" is also a debian developer since 2005 and is the debian maintainer of the package. qpdf is known to be used in at least three commercial products and was used by a former employer (who enthusiastically supported its open source development) to process millions of PDF documents for a wide range of customers.

Revision history for this message

Michael Terry (mterry) wrote on 2012-07-16:

I'm wary of adding another pdf parser to main. How much code is being duplicated that it's worth switching? When I looked at cups-filter, I can only see some argument parsing code and the P2PCharCodeToUnicode class mentioning poppler heritage.

I'll assign to jdstrand, for a security team perspective on how painful another pdf parser will be.

Changed in qpdf (Ubuntu):
assignee:	nobody → Jamie Strandboge (jdstrand)

Revision history for this message

Till Kamppeter (till-kamppeter) wrote on 2012-07-16:

Note that this is not a PDF interpreter, like Ghostscript is a PostScript interpreter. It is more like ps-utils but for PDF.

Revision history for this message

Michael Terry (mterry) wrote on 2012-07-16:

But it does seem to parse them, and thus has a security surface. Can you comment on the extent of the code duplication in pdftopdf? I'm trying to get a sense of the cost/benefit here for Jamie's benefit.

Revision history for this message

Tobias Hoffmann (smilingthax) wrote on 2012-07-17:

I'm the one currently working on the new libqpdf-based pdf-implementation.

The old implementation of pdftopdf does not duplicate that much of poppler, but it has a class structure that /parallels/ the poppler document representation, because poppler was never built to /output/ pdfs, but pdftopdf has to! This is done by all the P2P*-classes in pdftopdf: They can actually recreate the pdf-objects that poppler parsed.
So there is quite tight coupling of pdftopdf to the poppler internals, but there are much less eyes looking at pdftopdf than are looking at the poppler-code.
This make maintainance of pdftopdf tedious across changes between poppler versions.
IMO this current state does not benefit security at all.

qpdf is extensively tested, and built for the job: It provides an c++-object-oriented interface to the pdf-objects, but does not interpret their contents, except for the very basic structure (root catalog, pages tree), and neither does the new pdftodf -- because it does not have to. Basically all the transformations can be done without actually looking at the exact values from the input pdf; it's either pass-it-through or throw-it-away. That means: The actual code surface actually "exposed" to potentially dangerous inputs is quite small.

Revision history for this message

Jay Berkenbilt (ejb) wrote on 2012-07-17:

Download full text (3.9 KiB)

I'm the author of qpdf. You are certainly correct to be concerned about the security exposure of qpdf, and I'd be the last one to say that it is 100% of any potential for security problems. However I believe that it is in relatively good shape from a security perspective, and I'll briefly outline why I think this is the case.

QPDF is very careful with memory management. Almost all of the memory used by QPDF is managed using reference-counted smart pointers. QPDF makes heavy use of a bound-checked buffer class. QPDF rarely reads large amounts of data into memory and, in particular, does not read arbitrary streams from user files into memory at once. The large file tests of QPDF run with a memory footprint much smaller than the size of the largest stream in the file.

QPDF has a very thorough test suite with nearly complete code coverage. QPDF actually makes use of an explicit coverage technique I presented a paper on in 2004. Specifically it marks certain parts of the code with a call that essentially checks to make sure a certain condition occurred during the run of test suite. The test suite contains many examples of invalid files, though I admit that most of these are files I specifically created to exercise certain error conditions. Before every release, I run the entire test suite through valgrind and never release if there are any errors.

At a previous job, I had access to the Klocwork static analyzer. When I ran qpdf through it, never having previously exposed qpdf to a static analyzer, it only found one real issue (two instances of not exiting after reporting an error that a file couldn't be opened), and one case where it was hard to tell that memory was being properly checked. In the second case, I just clarified the code, but there wasn't actually an error condition. You can see the commit on github: https://github.com/qpdf/qpdf/commit/0ded90eff979c0a329736861995b2516139de114.

I have tried to code QPDF in a manner that is careful from a security perspective. I would refer you to https://github.com/qpdf/qpdf/blob/master/libqpdf/QPDF.cc and the method read_xrefTable. This method reads lines from the input file but adds constraints to make sure it never reads more than 50 bytes per line, thus preventing a file whose xref offset points to a place with extremely long lines from making qpdf allocate a huge amount of memory. When it reads xref entries, it validates each entry against a regular expression and throws an exception if it doesn't match. Only then does it actually trust the offsets.

As I read the code carefully, I see that there may be opportunities to force qpdf to try to read outside the bounds of a file by constructing a PDF file with object streams that have invalid offsets. I will audit this part of the code and strengthen it before the next release. In spite of this, qpdf never writes to memory associated with the input file, so causing it to write past a buffer boundary would be very difficult, and I can't think of any way of causing it to take action such as creating a file in the file system, exposing credentials, or anything of that sort.

I have a pretty strong background in dealing with security...

I'm the author of qpdf.  You are certainly correct to be concerned about the security exposure of qpdf, and I'd be the last one to say that it is 100% of any potential for security problems.  However I believe that it is in relatively good shape from a security perspective, and I'll briefly outline why I think this is the case.

QPDF is very careful with memory management.  Almost all of the memory used by QPDF is managed using reference-counted smart pointers.  QPDF makes heavy use of a bound-checked buffer class.  QPDF rarely reads large amounts of data into memory and, in particular, does not read arbitrary streams from user files into memory at once.  The large file tests of QPDF run with a memory footprint much smaller than the size of the largest stream in the file.

QPDF has a very thorough test suite with nearly complete code coverage.  QPDF actually makes use of an explicit coverage technique I presented a paper on in 2004.  Specifically it marks certain parts of the code with a call that essentially checks to make sure a certain condition occurred during the run of test suite.  The test suite contains many examples of invalid files, though I admit that most of these are files I specifically created to exercise certain error conditions.  Before every release, I run the entire test suite through valgrind and never release if there are any errors.

At a previous job, I had access to the Klocwork static analyzer.  When I ran qpdf through it, never having previously exposed qpdf to a static analyzer, it only found one real issue (two instances of not exiting after reporting an error that a file couldn't be opened), and one case where it was hard to tell that memory was being properly checked.  In the second case, I just clarified the code, but there wasn't actually an error condition.  You can see the commit on github: https://github.com/qpdf/qpdf/commit/0ded90eff979c0a329736861995b2516139de114.

I have tried to code QPDF in a manner that is careful from a security perspective.  I would refer you to https://github.com/qpdf/qpdf/blob/master/libqpdf/QPDF.cc and the method read_xrefTable.  This method reads lines from the input file but adds constraints to make sure it never reads more than 50 bytes per line, thus preventing a file whose xref offset points to a place with extremely long lines from making qpdf allocate a huge amount of memory.  When it reads xref entries, it validates each entry against a regular expression and throws an exception if it doesn't match.  Only then does it actually trust the offsets.

As I read the code carefully, I see that there may be opportunities to force qpdf to try to read outside the bounds of a file by constructing a PDF file with object streams that have invalid offsets.  I will audit this part of the code and strengthen it before the next release.  In spite of this, qpdf never writes to memory associated with the input file, so causing it to write past a buffer boundary would be very difficult, and I can't think of any way of causing it to take action such as creating a file in the file system, exposing credentials, or anything of that sort.

I have a pretty strong background in dealing with security issues in code.  I've been programming in C or C++ since 1985, spent a little time working on the Kerberos design and implementation teams between 1988 and 1991, and I am  the debian maintainer for tiff and ICU, both of which have relatively regular security issues.  Understanding and, in some cases, backporting those security fixes has also improved my understanding of the kinds of pitfalls programmers often fall into.

Finally, if there are any security issues found, I will be very responsive in fixing them.

So, while adding qpdf to main is not risk-free from a security angle, I think the risks are relatively low, and I believe qpdf is probably in better shape from a security angle than many of the packages already in main.  I would be happy to address any specific concerns you might have in this area.

Revision history for this message

Jay Berkenbilt (ejb) wrote on 2012-07-31:

For anyone who's watching, qpdf-3.0~rc1-1 is now in debian experimental. Running the test suite is now enabled in the build. Based on the exp buildd logs, it build and passed its test suite on all debian platforms. My plan is to allow a couple of weeks for feedback on the release candidate from a small handful of people who have indicated a desire to test. I will release qpdf-3.0.0 probably in mid-august. Any debian or ubuntu packages that need to the new APIs to support the print filtering work can declare a build dependency on qpdf >> 3.0~. There will be no non-compatible changes between the release candidate the full release.

Revision history for this message

Jay Berkenbilt (ejb) wrote on 2012-07-31:

(For the build dependency, of course, I mean libqpdf-dev >> 3.0~)

Revision history for this message

Till Kamppeter (till-kamppeter) wrote on 2012-07-31:

Jay, please make sure to give us enough time for updating to qpdf 3.0 final and promoting it to main before Ubuntu's Feature Freeze on August 23.

Revision history for this message

Jay Berkenbilt (ejb) wrote on 2012-07-31:

#10

Okay, I'll upload 3.0 final sooner. Is experimental okay, or do I need to upload to unstable in debian? Debian is in freeze right now. I'll aim for August 2 or 3 to upload 3.0 final.

Revision history for this message

Till Kamppeter (till-kamppeter) wrote on 2012-07-31:

#11

Great, experimental is OK.

Revision history for this message

Jay Berkenbilt (ejb) wrote on 2012-08-02:

#12

qpdf 3.0.0-1 is in debian experimental now.

Revision history for this message

Till Kamppeter (till-kamppeter) wrote on 2012-08-02:

#13

Jay, thanks for the quick release. It is synced to Ubuntu now.

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2012-08-08:

#14

MIR review:
* Builds fine with just main
* extensive testsuite enabled in the build
* Ubuntu does not carry a delta
* Does not ship a symbols file or use an empty argument to dh_makeshlibs -V. This should be considered, but is not a blocker.
* Does not have a bug subscriber
* Does have a watch file
* Releases are regular
* Lintian clean
* debian/rules is reasonable
* build log is free of compiler warnings/errors
* There are no open bugs in Debian or Ubuntu (other than this MIR :)

Security review:
No CVE history. I agree with Michael that it is undesirable to have multiple PDF libraries in main, but I am encouraged by Tobias and Jay's comments, in particular the bounds-checking and previous static analysis that Jay mentioned and what Tobias said about transformations not actually needing to examine/parse most of the PDF. The fact that Jay is the Debian maintainer and upstream and the software is mature and has been in Debian for several years and there are no upstream red flags.

Furthermore, compiler hardening is enabled, including PIE and BIND_NOW on the zlib-flate and qpdf binaries (yay!). There are no initscripts/upstart jobs, dbus services, setuid, fcaps usage, sudo/su/pkexec usage or cron jobs.

Doing a shallow code audit:
* there are uses of sprintf, but they are properly bounds checked with the destination being on the stack, so they are protected via stack-protector. The QUtil::copy_string() is well implemented.
* There were a few strcpy's as well, but they were also safely used and protected by stack-protector.
* memory allocations are via C++'s 'new' and not C-style *malloc
* I also verified Jay's statements regarding libqpdf/QPDF.cc

On the whole, from everything I looked at, qpdf is defensively programmed, well maintained and a good candidate for main even though it has 'pdf' in its name. Thanks Michael, Jay and Tobias for the discussion.

ACK

Changed in qpdf (Ubuntu):
assignee:	Jamie Strandboge (jdstrand) → nobody
status:	New → Fix Committed

Revision history for this message

Jay Berkenbilt (ejb) wrote on 2012-08-10: Re: [Bug 1024435] Re: [MIR] qpdf

#15

On 08/08/2012 07:29 PM, Jamie Strandboge wrote:

> MIR review:
> * Does not ship a symbols file or use an empty argument to dh_makeshlibs -V. This should be considered, but is not a blocker.

Consider it considered. I'm not aware of shipping a symbols file, so
it's something for me to learn about and see whether I can apply it to
my other packages. That said, I have a release checklist that includes
bumping the version in the shlibs file if any new APIs have been added,
so my shlibs files are likely to be as tight as needed and not tighter.
An automated method for keeping this right is better than a manual one
though, so I'll look into it. I'm not sure whether this being a C++
library complicates it as it complicates enumerating symbols in the
ldscript (if I understand correctly). Any pointers (especially stack
protected ones, oh wait, wrong kind of pointer) would be welcome.

Thanks for the detailed review.

--Jay

Revision history for this message

Martin Pitt (pitti) wrote on 2012-08-20:

#16

I just uploaded Till's cups-filters which needs this, promoted.

Changed in qpdf (Ubuntu):
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntuqpdf package

[MIR] qpdf

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
qpdf package