Comment 7 for bug 656382

Revision history for this message
Andrew Lundin (galundin) wrote :

pdfshuffler uses pyPdf, which is not maintained since 2010. This bug is fixed in PyPDF2, but I don't know what changes would be necessary to upgrade the dependency. So here's an informal "back-port" that got it working for me.

Here's the problem:
A well-formed PDF should have an end-of-file marker ("%%EOF") on the last line. But sometimes there are extra lines after this EOF marker. (I don't know why.) pyPdf will raise an error if it doesn't find the EOF marker on the last line. The solution in PyPDF2 is to ignore all lines after the EOF marker, and only raise an error if the EOF marker isn't found within the last kilobyte.

The error message refers to this file:
/usr/lib/pymodules/python2.7/pyPdf/pdf.py

Which is a symbolic link to this target file:
/usr/share/pyshared/pyPdf/pdf.py

Open that target file in an editor with super-user privileges. Find the read() method of the PdfFileReader class. It should be around line number 703. Near the beginning of that read() method, look for the following code:

        line = ''
        while not line:
            line = self.readNextEndLine(stream)
        if line[:5] != "%%EOF":
            raise utils.PdfReadError, "EOF marker not found"

Remove or comment out all of those lines and replace them with:

        last1K = stream.tell() - 1024
        line = ''
        while line[:5] != "%%EOF":
            if stream.tell() < last1K:
                raise utils.PdfReadError("EOF marker not found")
            line = self.readNextEndLine(stream)

For reference, here is the same code in PyPDF2:
https://github.com/mstamy2/PyPDF2/blob/master/PyPDF2/pdf.py#L1248

Unfortunately, this bug will never be fixed in pyPdf because it is deprecated and not maintained any more. pdfshuffler should upgrade to PyPDF2.