Comment 0 for bug 2039804

Revision history for this message
Jay Berkenbilt (ejb) wrote : qpdf: data loss bug affecting versions 11.0.0 through 11.6.2

Notes:

* I am the upstream author and debian maintainer for qpdf.
* This bug has been fixed in debian unstable and testing with version 11.6.3, but because 24.04 is not yet open, it has not synced. This should not block fixing 23.04 and 22.04. I have uploaded 11.6.3 to my ppa: https://launchpad.net/~qpdf/+archive/ubuntu/qpdf
* I am attaching debdiffs for lunar and mantic

Upstream bug https://github.com/qpdf/qpdf/issues/1050 revealed a bug in qpdf's lexical layer that would cause qpdf to discard the character in a binary string following an octal quoted character with 1 or 2 digits. The PDF spec allows octal digits to be \d, \dd, or \ddd, and allows the first two forms if the next character is other than an octal digit. Most PDF writers never use the \d or \dd forms, but some do. With default options, qpdf does not parse or alter strings inside content streams, so this bug is not likely to affect page content. However, binary strings of this sort are common in the document /ID and may also appear in metadata for encrypted files. In some cases, such as the file in #1050, this bug can cause error, in this case, because the discarded character was the string end delimiter. In most case, this bug results in silent data loss. The fix is very small and locally contained. The upstream fix includes several new test cases, but the patch I will include to fix the issue only includes the relevant code change.

I also reported this as a debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054158

It was approved as a stable update by debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054119

[ Impact ]

The bug could result in silent corruption of binary strings in PDF metadata. It could also result in failure of qpdf to process a valid file. Data loss justifies a stable update.

[ Test Plan ]

The test file in https://github.com/qpdf/qpdf/issues/1050 can be used to prove that the bug exists in versions >= 11.0.0 and <= 11.6.2 and that the bug is fixed in 11.6.3.

The upstream fix includes several additional automated test cases. These are not included in the patch, but they are included in the upstream commit that fixes the bug: https://github.com/qpdf/qpdf/commit/1ecc6bb29e24a4f89470ff91b2682b46e0576ad4

[ Where problems could occur ]

This fix has a very low risk of causing a regression. The fix is very localized to qpdf's lexical layer and is in a code path that only occurs when a 1-digit or 2-digit octal quoted character is terminated by other than an octal digit. This is the first bug in qpdf's lexical layer in many years. It was introduced by a pull request from a reliable and consistent contributor who has made may improvements to qpdf's performance. The fix follows the established pattern of how to handle instances in which a character triggers a state change and has to be reprocessed in the new state.

qpdf has a rigorous test suite and an extremely good quality record. It processes millions of documents daily by many commercial entities. My current employer runs millions of pages a day through qpdf.

[ Other Info ]

See also

Upstream bug report: https://github.com/qpdf/qpdf/issues/1050
Corresponding debian bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054158
Debian stable release approval: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054119