Multiple DocumentInfo keys

Bug #242755 reported by Mathieu Fenniak on 2008-06-24
46
This bug affects 8 people
Affects Status Importance Assigned to Milestone
pyPdf
Undecided
Unassigned

Bug Description

Originally reported via e-mail from Robert Boulanger:

I'm using your PyPDF Library specially for cataloging pdf files. When reading the DocumentInfos (Metadata, it often happens, that pdf's have the same keys multiple times.
In this case your lib raises an error, since multible keys are not permitted.
Wouldn't it be better to just skip the additional key/value pairs instead of raising an error ?

I changed generic.py in line 492 as follows:

          if data.has_key(key):
              # multiple definitions of key not permitted
              pass
              #raise utils.PdfReadError, "multiple definitions in dictionary"
          else:
              data[key] = value

==============================
Just adding that I too had to make the same change. I create pdf files with pdflatex using hyperref. Some of my pdfs could not be opened by pyPdf, although some could be opened with no problem (all generated using the same workflow). In any case, this modification made it possible to open all of my pdfs.

I think it should be considered for adding to the real distribution.
thanks,
--Tim Arnold

Tim Arnold (a-jtim) on 2008-10-27
description: updated
benjamin (jesuisbenjamin) wrote :

thanks! this is great, i had trouble opening some pdfs but this fixed the issue.

Tim Arnold (a-jtim) wrote :

had to do a new install today, and used a different workaround. I think it's a slight improvement:
around line 532:

if not data.get(key):
    data[key] = value

was:
if data.has_key(key):
    raise utils.PdfReadError, "multiple definitions in dictionary"
data[key] = value

just fyi,
--Tim Arnold

Changed in pypdf:
status: New → Confirmed
Michael Helsvig (micski) wrote :

PDF Shuffler for Ubuntu 13.04 presented same error message. The posted solution by Tim Arnold worked perfectly. Thanks.

[code]if data.has_key(key):
    raise utils.PdfReadError, "multiple definitions in dictionary"
data[key] = value[/code]

was changed to

[code]if not data.get(key):
    data[key] = value[/code]

Fabio M. Panico (fbugnon) wrote :

PDFShuffler 0.6.0 for Ubuntu 15.10 presented the same error message.
The solution posted by Tim Arnold #2 worked perfectly. Thanks

Just for information, the code to be changed currently begins on line 523 of the file:
 /usr/lib/python2.7/dist-packages/pyPdf/generic.py

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers