Cuneiform for Linux

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #623438
Comment #60

Comment 60 for bug 623438

Revision history for this message

Rudolf (rk-com) wrote on 2015-07-22: Re: Font size not correct in merged sandvich PDF

#60

use on hocr file to fix for hocr2pdf 0.8.9 textbox placement Edit (654 bytes, application/xml)

Many thanks to George Chriss! (see above)

My workaround based on his description:
Modify the created hocr by XSLT (see below). Then using hocr2pdf 0.8.9 - and the textboxes are placed (almost) correctly.

$ tesseract image.tif ocr_file hocr
$ xsltproc -html -nonet -novalid -o ocr_fixed.hocr fix-hocr.xsl ocr_file.hocr
$ hocr2pdf -i image.tif -o searchable.pdf <ocr_fixed.hocr

See attached file fix-hocr.xsl.