Comment 26 for bug 623438

Revision history for this message
Martin Wildam (mwildam) wrote : Re: Font size not correct in merged sandvich PDF

I have discussed this with somebody who is an expert in PDF and my current understanding is that for creating the PDF the underlying text behind the image displayed needs font size, spacing etc information to be correctly displayed in the viewer.

I noticed that not only the selection in the viewer does not work correctly. Also a lot of words are not found using the internal search functionality of viewers (tested with Evince and Adobe Acrobat Reader).

Side note: If I extract the full text using a PDF library I get a correct looking text (words separated by space, no spaces between words).

I think that creating a correct sandvich PDF is crucial and wonder why not more people are interested in this. But I also think, that it is not easy. I think it would be necessary to get experts in OCR, experts in PDF and experts in fonts together to solve this. - The key missing thing IMHO is to get font metric (font name, size, spacing, ...) information when only having the bounding boxes and contained text. Therefore I posted also the link above which I find important.