Multiple DocumentInfo keys

Bug #242755 reported by Mathieu Fenniak
54
This bug affects 10 people
Affects Status Importance Assigned to Milestone
PDF-Shuffler
New
Undecided
Unassigned
pyPdf
Confirmed
Undecided
Unassigned

Bug Description

Originally reported via e-mail from Robert Boulanger:

I'm using your PyPDF Library specially for cataloging pdf files. When reading the DocumentInfos (Metadata, it often happens, that pdf's have the same keys multiple times.
In this case your lib raises an error, since multible keys are not permitted.
Wouldn't it be better to just skip the additional key/value pairs instead of raising an error ?

I changed generic.py in line 492 as follows:

          if data.has_key(key):
              # multiple definitions of key not permitted
              pass
              #raise utils.PdfReadError, "multiple definitions in dictionary"
          else:
              data[key] = value

==============================
Just adding that I too had to make the same change. I create pdf files with pdflatex using hyperref. Some of my pdfs could not be opened by pyPdf, although some could be opened with no problem (all generated using the same workflow). In any case, this modification made it possible to open all of my pdfs.

I think it should be considered for adding to the real distribution.
thanks,
--Tim Arnold

Tags: xenial
Tim Arnold (a-jtim)
description: updated
Revision history for this message
benjamin (jesuisbenjamin) wrote :

thanks! this is great, i had trouble opening some pdfs but this fixed the issue.

Revision history for this message
Tim Arnold (a-jtim) wrote :

had to do a new install today, and used a different workaround. I think it's a slight improvement:
around line 532:

if not data.get(key):
    data[key] = value

was:
if data.has_key(key):
    raise utils.PdfReadError, "multiple definitions in dictionary"
data[key] = value

just fyi,
--Tim Arnold

Changed in pypdf:
status: New → Confirmed
Revision history for this message
Michael Helsvig (micski) wrote :

PDF Shuffler for Ubuntu 13.04 presented same error message. The posted solution by Tim Arnold worked perfectly. Thanks.

[code]if data.has_key(key):
    raise utils.PdfReadError, "multiple definitions in dictionary"
data[key] = value[/code]

was changed to

[code]if not data.get(key):
    data[key] = value[/code]

Revision history for this message
Fabio M. Panico (fbugnon) wrote :

PDFShuffler 0.6.0 for Ubuntu 15.10 presented the same error message.
The solution posted by Tim Arnold #2 worked perfectly. Thanks

Just for information, the code to be changed currently begins on line 523 of the file:
 /usr/lib/python2.7/dist-packages/pyPdf/generic.py

Revision history for this message
MarcH (marc-h38) wrote :
Revision history for this message
Lonnie Lee Best (launchpad-startport) wrote :

I'm having this same issue in Ubuntu 16.04; I'm trying to merge 3 1-page-pdf-files, and PDFsuffler alerts this dialog upon saving:

"multiple definitions in dictionary"

After this, the output-pdf-file shows a blank page.

tags: added: xenial
Revision history for this message
Lonnie Lee Best (launchpad-startport) wrote :

The workaround I did, was printed the PDF files to PDF (again) using Ubuntu 16.04's default PDF viewer, and then I was able to merge these 3 PDFs.

The PDFs were originally gmails that I printed using Chromium's print to PDF feature. Printing them to PDF again using a different too seems to clean out what ever was corrupting this the first try to merge them.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.