pdf2text outputs uncaught error
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
pdfminer (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Ubuntu Release
=============
Ubuntu 14.04.3
Package Version
============
python-pdfminer:
Installed: 20110515+dfsg-1
Candidate: 20110515+dfsg-1
Version table:
*** 20110515+dfsg-1 0
500 http://
100 /var/lib/
Expectation
=========
#get problem pdf
wget http://
#try extract text
pdf2txt CornwallPlannin
#The .pdf file's text should be visible in console.
What happened instead
==================
Python raises ValueError:
Traceback (most recent call last):
File "/usr/bin/pdf2txt", line 101, in <module>
if __name__ == '__main__': sys.exit(
File "/usr/bin/pdf2txt", line 95, in main
caching=
File "/usr/lib/
interpreter
File "/usr/lib/
self.
File "/usr/lib/
self.
File "/usr/lib/
self.
File "/usr/lib/
font = self.get_font(None, subspec)
File "/usr/lib/
font = PDFCIDFont(self, spec)
File "/usr/lib/
CMapParser(
File "/usr/lib/
self.
File "/usr/lib/
self.
File "/usr/lib/
((_,k),(_,v)) = self.pop(2)
ValueError: need more than 0 values to unpack
Potential patch [not checked if semantically correct]
=======
In cmapdb.py:
308 if name == 'def':
309 try:
310 ((_,k),(_,v)) = self.pop(2)
311 self.cmap.
312 except PSSyntaxError:
313 pass
314 return
Could become:
308 if name == 'def':
309 try:
310 ((_,k),(_,v)) = self.pop(2)
311 self.cmap.
312 except ValueError:
313 pass
314 except PSSyntaxError:
315 pass
316 return