pdf2text outputs uncaught error

Bug #1529473 reported by Pettis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
pdfminer (Ubuntu)
New
Undecided
Unassigned

Bug Description

Ubuntu Release
=============
Ubuntu 14.04.3

Package Version
============
python-pdfminer:
  Installed: 20110515+dfsg-1
  Candidate: 20110515+dfsg-1
  Version table:
 *** 20110515+dfsg-1 0
        500 http://gb.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages
        100 /var/lib/dpkg/status

Expectation
=========
#get problem pdf
wget http://docs.planning.cornwall.gov.uk/rpp/showimage.asp?j=PA14/04815&index=12319497&DB=8&DT=4
#try extract text
pdf2txt CornwallPlanningPlanning12319497.pdf
#The .pdf file's text should be visible in console.

What happened instead
==================
Python raises ValueError:

Traceback (most recent call last):
  File "/usr/bin/pdf2txt", line 101, in <module>
    if __name__ == '__main__': sys.exit(main(sys.argv))
  File "/usr/bin/pdf2txt", line 95, in main
    caching=caching, check_extractable=True)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 832, in process_pdf
    interpreter.process_page(page)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 757, in process_page
    self.render_contents(page.resources, page.contents, ctm=ctm)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 768, in render_contents
    self.init_resources(resources)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 339, in init_resources
    self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 193, in get_font
    font = self.get_font(None, subspec)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdfinterp.py", line 184, in get_font
    font = PDFCIDFont(self, spec)
  File "/usr/lib/python2.7/dist-packages/pdfminer/pdffont.py", line 637, in __init__
    CMapParser(self.unicode_map, StringIO(strm.get_data())).run()
  File "/usr/lib/python2.7/dist-packages/pdfminer/cmapdb.py", line 292, in run
    self.nextobject()
  File "/usr/lib/python2.7/dist-packages/pdfminer/psparser.py", line 584, in nextobject
    self.do_keyword(pos, token)
  File "/usr/lib/python2.7/dist-packages/pdfminer/cmapdb.py", line 311, in do_keyword
    ((_,k),(_,v)) = self.pop(2)
ValueError: need more than 0 values to unpack

Potential patch [not checked if semantically correct]
==========================================
In cmapdb.py:
    308 if name == 'def':
    309 try:
    310 ((_,k),(_,v)) = self.pop(2)
    311 self.cmap.set_attr(literal_name(k), v)
    312 except PSSyntaxError:
    313 pass
    314 return

Could become:
    308 if name == 'def':
    309 try:
    310 ((_,k),(_,v)) = self.pop(2)
    311 self.cmap.set_attr(literal_name(k), v)
    312 except ValueError:
    313 pass
    314 except PSSyntaxError:
    315 pass
    316 return

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.