Comment 28 for bug 808894

Revision history for this message
In , Adrian Johnson (ajohnson-redneon) wrote :

Created attachment 58643
fix regressions

This patch fixes the regressions in "Enable displayed chars to map to any number of text chars".

The problem is the changes now allow glyphs that map to zero length unicode strings to be added to TextWords. Often these glyphs have overlapping bounding boxes or are not on the same baseline. This confuses TextOutputDev when trying to determine the layout of the text.

This patch does two things:
- it avoids breaking words when one of these glyphs with an empty mapping is encountered
- it increases the tolerance for overlapping bounding boxes.

With the attached PDF the result the text output is still different but checking the differences it is actually an improvement.

However I suspect the changes could potentially break other PDFs. If this patch causes problems, plan B is to change TextOutputDev to ignore the glyphs with zero mapping when determining the text layout (but still add these glyphs to the words to make text selection work correctly). This should emulate the old behavior as closely as possible.