Many thanks to George Chriss! (see above)
My workaround based on his description: Modify the created hocr by XSLT (see below). Then using hocr2pdf 0.8.9 - and the textboxes are placed (almost) correctly.
$ tesseract image.tif ocr_file hocr $ xsltproc -html -nonet -novalid -o ocr_fixed.hocr fix-hocr.xsl ocr_file.hocr $ hocr2pdf -i image.tif -o searchable.pdf <ocr_fixed.hocr
See attached file fix-hocr.xsl.
Many thanks to George Chriss! (see above)
My workaround based on his description:
Modify the created hocr by XSLT (see below). Then using hocr2pdf 0.8.9 - and the textboxes are placed (almost) correctly.
$ tesseract image.tif ocr_file hocr
$ xsltproc -html -nonet -novalid -o ocr_fixed.hocr fix-hocr.xsl ocr_file.hocr
$ hocr2pdf -i image.tif -o searchable.pdf <ocr_fixed.hocr
See attached file fix-hocr.xsl.