infinite loop in PdfFileWriter._sweepIndirectReferences due to cyclic IndirectObjects
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
pyPdf |
New
|
Undecided
|
Unassigned | ||
python-pypdf (Debian) |
New
|
Undecided
|
Unassigned |
Bug Description
When attempting to write the pages from a current pdf file into a new pdf file, the method PdfFileWriter.
Although the current version of pyPdf has a simple stack to check to see if it has encountered an object, it fails in this case as the object is encountered again after it was already popped off of the stack. The result is that pyPdf attempts to sweep the object again, encountering all other previous objects it had previously encountered as well, resulting in the infinite loop.
I was able to solve the problem by turning the stack into a complete list of IndirectObjects already encountered. That is, I removed the pop statement so that the list of all IndirectObjects encountered was maintained.
Simple test for this issue:
from pyPdf.pdf import PdfFileReader, PdfFileWriter
pdf = file('FullUserG
pdfReader = PdfFileReader(pdf)
pdfWriter = PdfFileWriter()
for p in xrange(
pdfWriter.
newFile = file('CopyOfFul
pdfWriter.
I will attach a pdf, FullUserGuide.pdf, which causes this issue. It was originally created from a confluence wiki site using iText.
Here's a patch which solves the issue for me.