tesseract fails to train/OCR with certain numbers 6, 8, 9, 0

Bug #1010577 reported by Peter Edmond
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tesseract (Ubuntu)
New
Undecided
Unassigned

Bug Description

There is a fault with training certain numbers. These are:

0,6,8,9

The problem is that if a tiff/image ONLY contains the aforementioned numbers (in single or multiple lines) line, then it will not train with the image, producing an 'Empty Page' response. No box file is created.

Changing the page segment mode (psm) does not alter this.

However, adding say a 3 to the line makes the whole line immediately recognisable to the OCR engine.

I have attached a sample 0 to 9 tiff for working with.

Error demonstrated by making an image of only the aforementioned digits.

When recognising numbers such as 8869860, then nothing is returned by the OCR engine, even though the digits can be 100% recognised as single digits, or by adding extra digits to the end of the image to be OCRed.

Work around is to make sure that the aforementioned digits are never seen in isolation by artificially adding extra digits (in my case I add 53 to every image before OCRing it, and then stripping off the 53), OR you can individually break up the image into individual digits and OCR each digit individually using -psm 10

More example images available on request.

Revision history for this message
Peter Edmond (tesseract-6) wrote :

This image trains/OCRs perfectly well

Revision history for this message
Peter Edmond (tesseract-6) wrote :

These digits don't work!

I ought to add that this training is with Tesseract 3.01

Revision history for this message
Peter Edmond (tesseract-6) wrote :

I also ought to mention that this appears to be a Tesseract bug rather than an Ubuntu Tesseract bug as I get the same issue with the Windows version. I have posted a bug report on the Tesseract SourceForge site as well.

Revision history for this message
Peter Edmond (tesseract-6) wrote :
Revision history for this message
Jeff Breidenbach (jeff-jab) wrote :

upstream bug appears to still be open

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.