whitespace bbox incorrect in hocr output

Bug #662118 reported by julien
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cuneiform for Linux
New
Undecided
Unassigned

Bug Description

<p><span class='ocr_line' id='line_18' title="bbox 363 1253 581 1289"><b>BYGGNADER </b><span class='ocr_cinfo' title="x_bboxes 363 1253 382 1279 383 1254 407 1281 409 1255 431 1283 434 1256 458 1284 460 1258 485 1285 486 1260 511 1286 514 1261 538 1287 541 1260 560 1289 561 1261 581 1289 -1 -1 -1 -1 "></span></span>

note the whitespace in "BYGGNADER " and the associated bbox "-1 -1 -1 -1".

if the whitespace is a glypth it should have the bbox area of the glyph and the correct position, if the whitespace is not a glyph it should not be part of the ocr_line.

From my understanding of the official hOCR spec found here:
https://docs.google.com/View?docid=dfxcv4vc_67g844kf

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.