pdf2text outputs uncaught error

Bug #1529473 reported by Pettis on 2015-12-27
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
pdfminer (Ubuntu)
Undecided
Unassigned

Bug Description

Ubuntu Release
=============
Ubuntu 14.04.3

Package Version
============
python-pdfminer:
  Installed: 20110515+dfsg-1
  Candidate: 20110515+dfsg-1
  Version table:
 *** 20110515+dfsg-1 0
        500 http://gb.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages
        100 /var/lib/dpkg/status

Expectation
=========
#get problem pdf
wget http://docs.planning.cornwall.gov.uk/rpp/showimage.asp?j=PA14/04815&index=12319497&DB=8&DT=4
#try extract text
pdf2txt CornwallPlanningPlanning12319497.pdf
#The .pdf file's text should be visible in console.

What happened instead
==================
Python raises ValueError:

Traceback (most recent call last):
  File "/usr/bin/pdf2txt", line 101, in <module>
    if __name__ == '__main__': sys.exit(main(sys.argv))
  File "/usr/bin/pdf2txt", line 95, in main
    caching=caching, check_extractable=True)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 832, in process_pdf
    interpreter.process_page(page)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 757, in process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 768, in render_contents
    self.init_resources(resources)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 339, in init_resources
    self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 193, in get_font
    font = self.get_font(None, subspec)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 184, in get_font
    font = PDFCIDFont(self, spec)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdffont.py", line 637, in __init__
    CMapParser(self.unicode_map, StringIO(strm.get_data())).run()
  File "/usr/lib/python2.7/dist-packages/pdfminer/cmapdb.py", line 292, in run
    self.nextobject()
  File "/usr/lib/python2.7/dist-packages/pdfminer/psparser.py", line 584, in nextobject
    self.do_keyword(pos, token)
  File "/usr/lib/python2.7/dist-packages/pdfminer/cmapdb.py", line 311, in do_keyword
    ((_,k),(_,v)) = self.pop(2)
ValueError: need more than 0 values to unpack

Potential patch [not checked if semantically correct]
==========================================
In cmapdb.py:
    308 if name == 'def':
    309 try:
    310 ((_,k),(_,v)) = self.pop(2)
    311 self.cmap.set_attr(literal_name(k), v)
    312 except PSSyntaxError:
    313 pass
    314 return

Could become:
    308 if name == 'def':
    309 try:
    310 ((_,k),(_,v)) = self.pop(2)
    311 self.cmap.set_attr(literal_name(k), v)
    312 except ValueError:
    313 pass
    314 except PSSyntaxError:
    315 pass
    316 return

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers