I looked at the HTML - indeed there is no font height or size information there. So I assume, the coordinates of the boxes are simply inaccurate. - Or hocr2pdf is doing something wrong when merging the HTML with the image...
When I select Text in the result PDF it looks like the box is a little too small (missing a piece above), but for the Test10pages.pdf the effect is far more extreme. See here: http://www.youtube.com/watch?v=0d8_T-vV_Ak
In that case it selects in reality the line above the line I really want to select (the "für" is recognized as "ii" which is a different story).
I looked at the HTML - indeed there is no font height or size information there. So I assume, the coordinates of the boxes are simply inaccurate. - Or hocr2pdf is doing something wrong when merging the HTML with the image...
When I select Text in the result PDF it looks like the box is a little too small (missing a piece above), but for the Test10pages.pdf the effect is far more extreme. See here: http:// www.youtube. com/watch? v=0d8_T- vV_Ak
In that case it selects in reality the line above the line I really want to select (the "für" is recognized as "ii" which is a different story).