"EOF marker not found" prevents saving document

Bug #656382 reported by exactt
82
This bug affects 17 people
Affects Status Importance Assigned to Milestone
PDF Mod
Confirmed
Undecided
Unassigned
PDF-Shuffler
Unknown
Unknown
pdfshuffler (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: pdfshuffler

I have a pretty large (>200 pages) pdf file here. I can import it but can't save. Hitting the save button doesn't do a thing.

I am sorry I can't provide the file itself.

Yet, here is what appears on the command line:

:~$ pdfshuffler
/usr/lib/pymodules/python2.6/pyPdf/pdf.py:52: DeprecationWarning: the sets module is deprecated
  from sets import ImmutableSet
/usr/lib/pymodules/python2.6/pyPdf/generic.py:406: DeprecationWarning: object.__init__() takes no parameters
  str.__init__(self, data)
/usr/lib/pymodules/python2.6/pyPdf/generic.py:216: DeprecationWarning: object.__init__() takes no parameters
  int.__init__(self, value)
Traceback (most recent call last):
  File "/usr/bin/pdfshuffler", line 411, in choose_export_pdf_name
    self.export_to_file(file_out)
  File "/usr/bin/pdfshuffler", line 432, in export_to_file
    pdfdoc_inp = PdfFileReader(file(pdfdoc.copyname, 'rb'))
  File "/usr/lib/pymodules/python2.6/pyPdf/pdf.py", line 277, in __init__
    self.read(stream)
  File "/usr/lib/pymodules/python2.6/pyPdf/pdf.py", line 607, in read
    raise utils.PdfReadError, "EOF marker not found"
pyPdf.utils.PdfReadError: EOF marker not found

Revision history for this message
exactt (giesbert) wrote :

BTW: this is PDF Shuffler 0.5 on latest Maverick AMD64

Revision history for this message
exactt (giesbert) wrote :

just had the problem on a third computer. very anoying. built a large document and now 2 hours of work are wasted!

Changed in pdfshuffler (Ubuntu):
status: New → Confirmed
exactt (giesbert)
Changed in python-pypdf (Ubuntu):
status: New → Confirmed
Revision history for this message
Fritz Heinrichmeyer (fritz-heinrichmeyer) wrote :

maybe a backport of pypdf-0.13 to natty would help?

Revision history for this message
James Doyle (jujudu3) wrote :

Still a problem on 0.6. Seems to happen consistently for me when I combine / edit multiple pdfs

Revision history for this message
ih (ih-ad) wrote :

I have the same issues with certain documents. In my case it was my electric bill downloaded from Duke.
One workaround was to open that document in Evince and print it to PDF. Then I could open the PDF printer output file in pdfshuffler and add more documents and save OK.

Revision history for this message
Benjamín Burgos V. (bburgosv) wrote :

Happens to me too. No matter if I edit one single page or multiple PDF.
PdfShuffler 0.6.0 at LinuxMint 16 Cinnamon x64

Revision history for this message
Andrew Lundin (galundin) wrote :

pdfshuffler uses pyPdf, which is not maintained since 2010. This bug is fixed in PyPDF2, but I don't know what changes would be necessary to upgrade the dependency. So here's an informal "back-port" that got it working for me.

Here's the problem:
A well-formed PDF should have an end-of-file marker ("%%EOF") on the last line. But sometimes there are extra lines after this EOF marker. (I don't know why.) pyPdf will raise an error if it doesn't find the EOF marker on the last line. The solution in PyPDF2 is to ignore all lines after the EOF marker, and only raise an error if the EOF marker isn't found within the last kilobyte.

The error message refers to this file:
/usr/lib/pymodules/python2.7/pyPdf/pdf.py

Which is a symbolic link to this target file:
/usr/share/pyshared/pyPdf/pdf.py

Open that target file in an editor with super-user privileges. Find the read() method of the PdfFileReader class. It should be around line number 703. Near the beginning of that read() method, look for the following code:

        line = ''
        while not line:
            line = self.readNextEndLine(stream)
        if line[:5] != "%%EOF":
            raise utils.PdfReadError, "EOF marker not found"

Remove or comment out all of those lines and replace them with:

        last1K = stream.tell() - 1024
        line = ''
        while line[:5] != "%%EOF":
            if stream.tell() < last1K:
                raise utils.PdfReadError("EOF marker not found")
            line = self.readNextEndLine(stream)

For reference, here is the same code in PyPDF2:
https://github.com/mstamy2/PyPDF2/blob/master/PyPDF2/pdf.py#L1248

Unfortunately, this bug will never be fixed in pyPdf because it is deprecated and not maintained any more. pdfshuffler should upgrade to PyPDF2.

touil (touil-medfa)
affects: python-pypdf (Ubuntu) → pdf
Revision history for this message
Thomas Baeckeroot (thomas.baeckeroot) wrote :

Congratulations to Andrew Lundin (galundin).

1- I am confirming the Bug
2
- I am confirming the solution posted by Andrew. It resolved for me.

May-be this confirmed bug should support an upgrade to PyPDF2...

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pdfshuffler - 0.6.0-8

---------------
pdfshuffler (0.6.0-8) unstable; urgency=medium

  * Add build-dep on dh-python.
  * Default to pypdf2 instead of deprecated pypdf. (Closes: #763973, #763976,
    LP: #656382)
  * Fix missing UI translations and add Spanish and Catalan translations.
    (Closes: #756281, LP: #1328862)
  * Install hi-res scalable svg icon. (Closes: #768306, LP: #1081190)
  * Add missing MimeType to desktop menu file. (LP: #1334124, LP: #1349140)
    - Drop duplicate debian/pdfshuffler.desktop.
  * Update to Standards version 3.9.8.
    - Drop obsolete debian/menu file.

 -- Vincent Cheng <email address hidden> Wed, 25 May 2016 18:26:19 -0700

Changed in pdfshuffler (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Mélodie (meets) wrote :

Hi,

this is what is installed here:
$ apt-cache policy python-pypdf
python-pypdf:
  Installé : 1.13-1
  Candidat : 1.13-1

$ apt-cache policy pdfshuffler
pdfshuffler:
  Installé : 0.6.0-8

I'm running Ubuntu 18.04 updated each day.

I have a bunch of pdf files I need to print, creating a unique pdf to send them in one shot to the printer would be nice. Thanks to fix it again, if possible?

Best regards,
Mélodie

Revision history for this message
Mélodie (meets) wrote :

Hi,

I solved it by installing python-pypdf2. I suggest to change the dependancy in the package from python-pypdf to python-pypdf2 to solve it for all users.

thanks for your work!

best regards,
Mélodie

Revision history for this message
venoel (denis-olenev) wrote :

Hi. Have the same problem with "EOF marker not found" (it is first time for some years of PDF Shuffle using)

Patch ofAndrew Lundin (galundin) exists in (for my case)
/usr/lib/python2.7/dist-packages/PyPDF2/pdf.py

Row 1692

        last1K = stream.tell() - 1024 + 1 # offset of last 1024 bytes of stream
        line = b_('')
        while line[:5] != b_("%%EOF"):
            if stream.tell() < last1K:
                raise utils.PdfReadError("EOF marker not found")
            line = self.readNextEndLine(stream)
            if debug: print(" line:",line)

But error still appears as it seems EOF not in 1024 bytes range.

So I increase it to 8024 bytes and things become fine.

        last1K = stream.tell() - 8024 + 1 # offset of last 1024 bytes of stream
        line = b_('')
        while line[:5] != b_("%%EOF"):
            if stream.tell() < last1K:
                raise utils.PdfReadError("EOF marker not found")
            line = self.readNextEndLine(stream)
            if debug: print(" line:",line)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.