infinite loop in PdfFileWriter._sweepIndirectReferences due to cyclic IndirectObjects

Bug #583024 reported by Bernie Bernstein
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
pyPdf
New
Undecided
Unassigned
python-pypdf (Debian)
New
Undecided
Unassigned

Bug Description

When attempting to write the pages from a current pdf file into a new pdf file, the method PdfFileWriter._sweepIndirectObjects finds itself in an infinite loop as it continues to sweep indirect objects it has previously encountered.

Although the current version of pyPdf has a simple stack to check to see if it has encountered an object, it fails in this case as the object is encountered again after it was already popped off of the stack. The result is that pyPdf attempts to sweep the object again, encountering all other previous objects it had previously encountered as well, resulting in the infinite loop.

I was able to solve the problem by turning the stack into a complete list of IndirectObjects already encountered. That is, I removed the pop statement so that the list of all IndirectObjects encountered was maintained.

Simple test for this issue:

from pyPdf.pdf import PdfFileReader, PdfFileWriter
pdf = file('FullUserGuide.pdf', 'r') # Sample pdf attached to issue
pdfReader = PdfFileReader(pdf)
pdfWriter = PdfFileWriter()
for p in xrange(pdfReader.getNumPages()):
    pdfWriter.addPage(pdfReader.getPage(p))
newFile = file('CopyOfFullUserGuide.pdf', 'w')
pdfWriter.write(newFile)

I will attach a pdf, FullUserGuide.pdf, which causes this issue. It was originally created from a confluence wiki site using iText.

Revision history for this message
Bernie Bernstein (bernie9998) wrote :
Revision history for this message
Bernie Bernstein (bernie9998) wrote :

Here's a patch which solves the issue for me.

Revision history for this message
Bernie Bernstein (bernie9998) wrote :

Oops, posted wrong file. . . please ignore previous patch and use this one instead.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.