Data loss: qpdf discards the character in a binary string following an octal quoted character with 1 or 2 digits

Bug #2039804 reported by Jay Berkenbilt
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Qpdf
Fix Released
Unknown
qpdf (Debian)
Fix Released
Unknown
qpdf (Ubuntu)
Fix Released
High
Unassigned
Lunar
Fix Released
High
Unassigned
Mantic
Fix Released
High
Unassigned

Bug Description

Notes:

* I am the upstream author and debian maintainer for qpdf.
* This bug has been fixed in debian unstable and testing with version 11.6.3, but because 24.04 is not yet open, it has not synced. This should not block fixing 23.04 and 22.04. I have uploaded 11.6.3 to my ppa: https://launchpad.net/~qpdf/+archive/ubuntu/qpdf
* I am attaching debdiffs for lunar and mantic

Upstream bug https://github.com/qpdf/qpdf/issues/1050 revealed a bug in qpdf's lexical layer that would cause qpdf to discard the character in a binary string following an octal quoted character with 1 or 2 digits. The PDF spec allows octal digits to be \d, \dd, or \ddd, and allows the first two forms if the next character is other than an octal digit. Most PDF writers never use the \d or \dd forms, but some do. With default options, qpdf does not parse or alter strings inside content streams, so this bug is not likely to affect page content. However, binary strings of this sort are common in the document /ID and may also appear in metadata for encrypted files. In some cases, such as the file in #1050, this bug can cause error, in this case, because the discarded character was the string end delimiter. In most case, this bug results in silent data loss. The fix is very small and locally contained. The upstream fix includes several new test cases, but the patch I will include to fix the issue only includes the relevant code change.

I also reported this as a debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054158

It was approved as a stable update by debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054119

[ Impact ]

The bug could result in silent corruption of binary strings in PDF metadata. It could also result in failure of qpdf to process a valid file. Data loss justifies a stable update.

[ Test Plan ]

The test file in https://github.com/qpdf/qpdf/issues/1050 can be used to prove that the bug exists in versions >= 11.0.0 and <= 11.6.2 and that the bug is fixed in 11.6.3.

The upstream fix includes several additional automated test cases. These are not included in the patch, but they are included in the upstream commit that fixes the bug: https://github.com/qpdf/qpdf/commit/1ecc6bb29e24a4f89470ff91b2682b46e0576ad4

How to test the SRU package on Ubuntu manually (copied from Jay's comment #6 below):

Running `qpdf --check 018.pdf` where `018.pdf` is the file attached to the upstream bug will reproduce the issue. With the current version in 22.04 and 23.04, you will see something like this:

```
WARNING: /tmp/z/018.pdf (xref stream: object 17 1, offset 110340): EOF while reading token
WARNING: /tmp/z/018.pdf (xref stream: object 17 1, offset 110830): unexpected EOF
WARNING: /tmp/z/018.pdf (xref stream: object 17 1, offset 110830): parse error while reading object
WARNING: /tmp/z/018.pdf (xref stream: object 17 1, offset 110830): expected endobj
WARNING: /tmp/z/018.pdf: file is damaged
WARNING: /tmp/z/018.pdf (offset 110267): xref not found
WARNING: /tmp/z/018.pdf: Attempting to reconstruct cross-reference table
qpdf: /tmp/z/018.pdf: unable to find trailer dictionary while recovering damaged file
```

After the fix, you will see
```
checking /home/ejb/Downloads/018.pdf
PDF Version: 1.7
File is not encrypted
File is not linearized
No syntax or stream encoding errors found; the file may still contain
errors that qpdf cannot detect
```
(obviously with the full paths based on whatever you call the file).

[ Where problems could occur ]

This fix has a very low risk of causing a regression. The fix is very localized to qpdf's lexical layer and is in a code path that only occurs when a 1-digit or 2-digit octal quoted character is terminated by other than an octal digit. This is the first bug in qpdf's lexical layer in many years. It was introduced by a pull request from a reliable and consistent contributor who has made may improvements to qpdf's performance. The fix follows the established pattern of how to handle instances in which a character triggers a state change and has to be reprocessed in the new state.

qpdf has a rigorous test suite and an extremely good quality record. It processes millions of documents daily by many commercial entities. My current employer runs millions of pages a day through qpdf.

[ Other Info ]

See also

Upstream bug report: https://github.com/qpdf/qpdf/issues/1050
Corresponding debian bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054158
Debian stable release approval: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054119

Revision history for this message
Jay Berkenbilt (ejb) wrote :
Revision history for this message
Jay Berkenbilt (ejb) wrote :
Revision history for this message
Jay Berkenbilt (ejb) wrote :

I've gone as far as I think I can, but I can do additional steps if needed. I have not tagged this or uploaded anything anywhere. Please let me know if I can/should take any additional steps to get this approved and processed as an SRU.

summary: - qpdf: data loss bug affecting versions 11.0.0 through 11.6.2
+ SRU request: qpdf: data loss bug affecting versions 11.0.0 through
+ 11.6.2
Changed in qpdf (Ubuntu Lunar):
importance: Undecided → High
Changed in qpdf (Ubuntu Mantic):
importance: Undecided → High
Revision history for this message
Till Kamppeter (till-kamppeter) wrote : Re: SRU request: qpdf: data loss bug affecting versions 11.0.0 through 11.6.2

Thank you very much, Jay, for writing up this SRU bug report and providing the debdiffs with the fix. This is the usual way how you contribute if you do not have upload rights to the Ubuntu archives. You post debdiffs in a bug report and a person with appropriate rights (the so-called "sponsor") applies the debdiff and uploads the resulting package.

I have uploaded the appropriate packages right now and subscribed the SRU team (ubuntu-sru group) to this report. Now the SRU Team will check the uploaded packages and put them into the -proposed repositories of the distro versions affected. Then they call for testing by a comment here. Once somebody tests the packages and finds that the bug is actually fixed, and nobody reports a regression, the fixed packages get released into the affected distros.

no longer affects: qpdf
Changed in qpdf:
status: Unknown → Fix Released
Revision history for this message
Robie Basak (racb) wrote :

Thank you for contributing this fix!

Could you please expand on the Test Plan?

For manual testing, what are the exact steps that a user would perform, what results would indicate that the bug is present, and what results would indicate that the bug is fixed? For example, I don't see a command to run in https://github.com/qpdf/qpdf/issues/1050, or the expected results if the bug is fixed.

For automated testing, you said the tests aren't included in the patch. Why not? Could we include them instead, and therefore increase confidence in the correctness of the fix?

> qpdf has a rigorous test suite

This is great, but is this test suite run as part of the package build and publication process in Ubuntu? If not, how could we arrange to run the test suite against the Ubuntu build of the package that includes this patch to ensure that we aren't regressing unaffected users in providing this update?

Revision history for this message
Jay Berkenbilt (ejb) wrote :

> This is great, but is this test suite run as part of the package build and publication process in Ubuntu? If not, how could we arrange to run the test suite against the Ubuntu build of the package that includes this patch to ensure that we aren't regressing unaffected users in providing this update?

The test suite is run as part of the Ubuntu build. The patch doesn't include the new test cases, but it does include all the old ones, which should demonstrate that there is no regression. When 24.04 opens for business and the packages sync, the new tests will run as they are in 11.6.3. You can also see it in my PPA: https://launchpad.net/~qpdf/+archive/ubuntu/qpdf. In any case, I believe the fact that test suite is run during the existing builds should be sufficient to ensure that there is no regression. qpdf also has meaningful autopkgtests. You can also check the debian build output. Debian stable proposed updates has 11.3.0 with the same patch.

For manual testing:

Running `qpdf --check 018.pdf` where `018.pdf` is the file attached to the upstream bug will reproduce the issue. With the current version in 22.04 and 23.04, you will see something like this:

```
WARNING: /tmp/z/018.pdf (xref stream: object 17 1, offset 110340): EOF while reading token
WARNING: /tmp/z/018.pdf (xref stream: object 17 1, offset 110830): unexpected EOF
WARNING: /tmp/z/018.pdf (xref stream: object 17 1, offset 110830): parse error while reading object
WARNING: /tmp/z/018.pdf (xref stream: object 17 1, offset 110830): expected endobj
WARNING: /tmp/z/018.pdf: file is damaged
WARNING: /tmp/z/018.pdf (offset 110267): xref not found
WARNING: /tmp/z/018.pdf: Attempting to reconstruct cross-reference table
qpdf: /tmp/z/018.pdf: unable to find trailer dictionary while recovering damaged file
```

After the fix, you will see
```
checking /home/ejb/Downloads/018.pdf
PDF Version: 1.7
File is not encrypted
File is not linearized
No syntax or stream encoding errors found; the file may still contain
errors that qpdf cannot detect
```
(obviously with the full paths based on whatever you call the file).

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Thanks Jay, for elaborating on the test procedures and also for providing instructions for manual testing so that users can easily verify the fix. I have copied your instructions to the "[ Test Plan ]" section of the initial description of the bug so that they get easily found.

description: updated
Revision history for this message
Robie Basak (racb) wrote :

> The patch doesn't include the new test cases, but it does include all the old ones, which should demonstrate that there is no regression.

OK, but doesn't that just leave a problem behind for the next person who tries to do an SRU to this package? Can't we just include the tests?

Revision history for this message
Jay Berkenbilt (ejb) wrote :

The tests include binary files. Unless something has changed, I can't create a quilt patch that modifies a binary file. The qpdf test suite is full of PDF files, and PDF files are binary files, even when they are hand-created, as many of mine are. This makes including tests in patches challenging. There are a few unit tests that we could include, but they will likely not backport cleanly, which adds additional risk, and I don't think including them in the patch will add any value since the tests are specifically crafted to prove that this bug is fixed, and we have already done that through current releases and the manual tests. The bug affects a very specific pattern in the data. This part of the code that was affected is not actively modified, and the bug involved changing an infrequently-traversed section of code. The tokenizer is very low level and stable, and I did an audit of the code again after this bug was found.

qpdf has been an open source project since 2008 and has been in debian since the beginning. I am the author and debian maintainer. It is excessively unlikely that there will a "next" SRU that will come from other than the version control system, and unless I am hit by the proverbial bus, I will probably be the originator of it. While qpdf has a wide following, if I were not there to fix problems like this and push them through, it is not likely that problems like this would even be entered into an SRU process. It's been quite a long time since there's been any divergence between the debian and Ubuntu versions of the package.

While I agree that more testing is better, I wonder if the benefit we will get is actually worth the effort. The new tests exercise specific case related to this fix. If there is still a bug of this type, adding the new tests will not help us find them. I don't think it would be worth the hassle to tweak the test system or jump through some hoops to allow the binary files involved in the tests to get into the patch -- that is more likely to cause a problem than to prevent one. The same is true of trying to backport the unit tests. qpdf processes millions of pages a day across a wide range of free and commercial systems. My employer processes over a million pages a day all by itself.

In qpdf's 21 year history, 15 of which have been open source, this is only the second data-loss bug, and it has the potential to cause silent corruption of data in rare corner cases. I appreciate the diligence here, but I really think the level of testing that has already gone into this, the inclusion of new patches in the latest release, is adequate, and now that the bug is known, our priority should be getting a well-reviewed, well-tested patch in front of users to minimize the risk of data loss.

Please let me know what else I need to do to get this through. I will cooperate, of course. My intention here is not to argue, but rather to try to assure you that the testing is already quite thorough, most likely exceeding the standard that would be applied to most packages by their authors. We are passing a point of diminishing return. Thanks for considering my argument.

Revision history for this message
Paride Legovini (paride) wrote :

Thanks for the additional information and context Jay.

I agree with Robie in that it would in principle be nice to have a test for this SRU in the test suite that gets executed at package build time, however I find the [ Test Plan ] to be thoughtful enough. This bug and the fix fully meet the SRU criteria IMHO.

Given that this is a data loss bug, I agree with the Importance: High setting.

I'm sponsoring this as proposed, only with the addition of update-maintainer(1) changes.

Revision history for this message
Paride Legovini (paride) wrote :

Well, I'm also adding an LP: reference to this bug in d/changelog to trigger the Launchpad automation.

Revision history for this message
Paride Legovini (paride) wrote :

Uploading qpdf_11.3.0-1ubuntu1.dsc
Uploading qpdf_11.3.0-1ubuntu1.debian.tar.xz
Uploading qpdf_11.3.0-1ubuntu1_source.buildinfo
Uploading qpdf_11.3.0-1ubuntu1_source.changes

Uploading qpdf_11.5.0-1ubuntu1.dsc
Uploading qpdf_11.5.0-1ubuntu1.debian.tar.xz
Uploading qpdf_11.5.0-1ubuntu1_source.buildinfo
Uploading qpdf_11.5.0-1ubuntu1_source.changes

Changed in qpdf (Ubuntu):
status: New → Confirmed
Changed in qpdf (Debian):
status: Unknown → Confirmed
Robie Basak (racb)
summary: - SRU request: qpdf: data loss bug affecting versions 11.0.0 through
- 11.6.2
+ Data loss: qpdf discards the character in a binary string following an
+ octal quoted character with 1 or 2 digits
Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello Jay, or anyone else affected,

Accepted qpdf into mantic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qpdf/11.5.0-1ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-mantic to verification-done-mantic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-mantic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qpdf (Ubuntu Mantic):
status: New → Fix Committed
tags: added: verification-needed verification-needed-mantic
Changed in qpdf (Ubuntu Lunar):
status: New → Fix Committed
tags: added: verification-needed-lunar
Revision history for this message
Robie Basak (racb) wrote :

Hello Jay, or anyone else affected,

Accepted qpdf into lunar-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qpdf/11.3.0-1ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-lunar to verification-done-lunar. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-lunar. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Robie Basak (racb) wrote :

Thank you for the discussion. On balance I think what you've said makes sense and it's not worth going further, but I think it's important to have the trade-offs and choices documented here and that's done now.

Revision history for this message
Robie Basak (racb) wrote :

The development release isn't open yet. We will need to sync across from Debian when it opens. I am subscribed to this bug. Feel free to ping me here if it's not been done in a few weeks.

Changed in qpdf (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Robie Basak (racb) wrote :

I understand that qpdf in Noble will auto-sync when Noble opens.

Changed in qpdf (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Jay Berkenbilt (ejb) wrote :

All, thanks for pushing this through. To be clear, I agree with all the points that it would have been better if we could have included the extra tests, but I'm glad you agree with my analysis about the cost vs. benefit. I will find some time to test in the next few days. I can do the manual test from the test plan and also do a local build with the additional test cases by just cherry-picking the commit into a temporary branch.

Revision history for this message
Jay Berkenbilt (ejb) wrote :

I have verified by running manual tests on a docker image of 23.04 and 23.10 after manually installing the packages as downloaded from https://bugs.launchpad.net/ubuntu/+source/qpdf by verifying that I could reproduce the bug, then install the updated libqpdf package, then verifying that the bug was fixed.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Thanks, Jay, marking as verified ...

tags: added: verification-done verification-done-lunar verification-done-mantic
removed: verification-needed verification-needed-lunar verification-needed-mantic
Changed in qpdf (Debian):
status: Confirmed → Fix Released
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Marking Noble as Fix Released (11.6.3-1).

$ git log --oneline -1 pkg/ubuntu/noble | cat
03517080d29d 11.6.3-1 (patches unapplied)

$ git show pkg/ubuntu/noble -- libqpdf/QPDFTokenizer.cc
<OK>

Changed in qpdf (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Happy to (in addition to comment #19) confirm that qpdf + libqpdf29 from noble-release and mantic/lunar-proposed fix the issue.

noble:
---

 $ lsb_release -cs
 No LSB modules are available.
 noble

 $ wget -q https://github.com/qpdf/qpdf/files/12885974/018.pdf

 $ sudo apt update && sudo apt install -y qpdf

 $ dpkg -s qpdf | grep Version:
 Version: 11.6.3-1

 $ qpdf --check 018.pdf
 checking 018.pdf
 PDF Version: 1.7
 File is not encrypted
 File is not linearized
 No syntax or stream encoding errors found; the file may still contain
 errors that qpdf cannot detect

mantic:
---

 $ lsb_release -cs
 No LSB modules are available.
 mantic

 $ wget -q https://github.com/qpdf/qpdf/files/12885974/018.pdf

 $ sudo apt update && sudo apt install -y qpdf

 $ dpkg -s qpdf libqpdf29 | grep Version:
 Version: 11.5.0-1
 Version: 11.5.0-1

 $ qpdf --check 018.pdf
 WARNING: 018.pdf (xref stream: object 17 1, offset 110340): EOF while reading token
 WARNING: 018.pdf (xref stream: object 17 1, offset 110830): unexpected EOF
 WARNING: 018.pdf (xref stream: object 17 1, offset 110830): parse error while reading object
 WARNING: 018.pdf (xref stream: object 17 1, offset 110830): expected endobj
 WARNING: 018.pdf: file is damaged
 WARNING: 018.pdf (offset 110267): xref not found
 WARNING: 018.pdf: Attempting to reconstruct cross-reference table
 qpdf: 018.pdf: unable to find trailer dictionary while recovering damaged file

 $ sudo add-apt-repository -yp proposed
 $ sudo apt install -y -t mantic-proposed qpdf

 $ dpkg -s qpdf libqpdf29 | grep Version:
 Version: 11.5.0-1ubuntu1
 Version: 11.5.0-1ubuntu1

 $ qpdf --check 018.pdf
 checking 018.pdf
 PDF Version: 1.7
 File is not encrypted
 File is not linearized
 No syntax or stream encoding errors found; the file may still contain
 errors that qpdf cannot detect

lunar:
---

 $ lsb_release -cs
 lunar

 $ wget -q https://github.com/qpdf/qpdf/files/12885974/018.pdf

 $ sudo apt update && sudo apt install -y qpdf

 $ dpkg -s qpdf libqpdf29 | grep Version:
 Version: 11.3.0-1
 Version: 11.3.0-1

 $ qpdf --check 018.pdf
 WARNING: 018.pdf (xref stream: object 17 1, offset 110340): EOF while reading token
 WARNING: 018.pdf (xref stream: object 17 1, offset 110830): unexpected EOF
 WARNING: 018.pdf (xref stream: object 17 1, offset 110830): parse error while reading object
 WARNING: 018.pdf (xref stream: object 17 1, offset 110830): expected endobj
 WARNING: 018.pdf: file is damaged
 WARNING: 018.pdf (offset 110267): xref not found
 WARNING: 018.pdf: Attempting to reconstruct cross-reference table
 qpdf: 018.pdf: unable to find trailer dictionary while recovering damaged file

 $ sudo add-apt-repository -yp proposed
 $ sudo apt install -y -t lunar-proposed qpdf

 $ dpkg -s qpdf libqpdf29 | grep Version:
 Version: 11.3.0-1ubuntu1
 Version: 11.3.0-1ubuntu1

 $ qpdf --check 018.pdf
 checking 018.pdf
 PDF Version: 1.7
 File is not encrypted
 File is not linearized
 No syntax or stream encoding errors found; the file may still contain
 errors that qpdf cannot detect

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qpdf - 11.5.0-1ubuntu1

---------------
qpdf (11.5.0-1ubuntu1) mantic; urgency=medium

  * Fix data loss bug introduced in 11.0.0 and fixed in 11.6.3. The bug
    causes the qpdf tokenizer to discard the character after a one-digit
    or two-digit quoted octal string. Most writers don't create these, and
    they are rare outside of content streams. By default, qpdf doesn't
    parse content streams. The most common place for this to occur would
    be in a document's /ID string, but in the worst case, this bug could
    cause silent damage to some strings in a PDF file's metadata, such as
    bookmark names or form field values. (LP: #2039804)

 -- Jay Berkenbilt <email address hidden> Thu, 19 Oct 2023 07:20:25 -0400

Changed in qpdf (Ubuntu Mantic):
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Update Released

The verification of the Stable Release Update for qpdf has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qpdf - 11.3.0-1ubuntu1

---------------
qpdf (11.3.0-1ubuntu1) lunar; urgency=medium

  * Fix data loss bug introduced in 11.0.0 and fixed in 11.6.3. The bug
    causes the qpdf tokenizer to discard the character after a one-digit
    or two-digit quoted octal string. Most writers don't create these, and
    they are rare outside of content streams. By default, qpdf doesn't
    parse content streams. The most common place for this to occur would
    be in a document's /ID string, but in the worst case, this bug could
    cause silent damage to some strings in a PDF file's metadata, such as
    bookmark names or form field values. (LP: #2039804)

 -- Jay Berkenbilt <email address hidden> Thu, 19 Oct 2023 07:09:54 -0400

Changed in qpdf (Ubuntu Lunar):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.