Comment 5 for bug 519139

Revision history for this message
Alvin Penner (apenner) wrote :

in the file ConferenceProgram_2008, there is some missing test on page 16. The text begins with "Rabobank is among the most sustainable" and continues with the text 'www.rabobank.com'.
    This text is rendered correctly in Gimp and also in Evince. But it is not interpreted correctly by the program pdf2txt.py which is part of the parser called pdfminer.py : http://pypi.python.org/pypi/pdfminer/
    For example the text 'www.rabobank.com' gets interpreted by pdfminer as :
(cid:88)(cid:88)(cid:88)(cid:15)(cid:83)(cid:66)(cid:67)(cid:80)(cid:67)(cid:66)(cid:79)(cid:76)(cid:15)(cid:68)(cid:80)(cid:78)
where the letter 'a' is represented by '66'. It appears there is something unique about this line since the preceding text was being correctly interpreted by pdfminer.