custom text encodings

Bug #232212 reported by Mathieu Fenniak
4
Affects Status Importance Assigned to Milestone
pyPdf
New
Undecided
Unassigned

Bug Description

In the attached hello_world.pdf file, the text is being drawn by the following TJ command (operand, operator):

    [['\x01', -13.8818, '\x02', 17.0689, '\x03', -5.16561, '\x03', -5.16561, '\x04', 2.15722, '\x05', 3.67949, '\x06', 2.5903, '\x07', -0.30418, '\x04', 2.15654, '\x08', 2.23293, '\x03', -5.16561, '\t', 11.3788, '\n', 6.60875, '\x06']] 'TJ'

The bytes being drawn are using the font \R7 (['/R7', 10.7452] 'Tf').

The font \R7:

7 0 obj
<</BaseFont/RDZRPI+Calibri/FontDescriptor 8 0 R/ToUnicode 11 0 R/Type/Font
/FirstChar 1/LastChar 10/Widths[ 623 498 229 527 250 226 715 349 525 252]
/Encoding 12 0 R/Subtype/TrueType>>
endobj

Encoding:

12 0 obj
<</Type/Encoding/BaseEncoding/WinAnsiEncoding/Differences[
1/g44/g286/g367/g381/g853/g3/g449/g396/g282/g856]>>
endobj

PyPdf does not support reading a custom encoding from the document while drawing text, and therefore the extractText method does not return any text for this file.

Revision history for this message
Mathieu Fenniak (mfenniak) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.