Comment 4 for bug 710586

Revision history for this message
RaiMan (raimund-hocke) wrote :

***** from a post on the mailing list sikuli-dev by macs

Is the latest Sikuli migrated to tesseract3? I see a branch name as tesseract3 in git hub. I see many issues regarding OCR being discussed in launchpad.

In my understanding OCR results can be improved by pre-processing of images
1. Convert image to gray scale.
2. Improve contrast or apply edge detection filters.
3. inverting colors or negative
4. Reducing the color depth.
5. Apply image smoothing filters.

All filters may not be applicable for all types of images. User might want to improve a filter or a combination of filter to achieve better results. Can we give this option to user?

I was not sure if any of the pre processing was done in the RC2 release. I tried to modify the function "doFind(PSC ptn)" in region.java to convert image to grayscale before OCR processing. But I could not see any improvement in OCR. I did not try further because my eclipse environment is not setup completely. Does Sikuli do any pre-processing of image before calling the OCR?

It would be nice if you can have the following support for OCR in Sikuli

1. Option for user to select language (Already requested)
2. Tesseract supports training and creation of box files. We should have a option to select user trained files.
3. There are many commercial OCR tools which has higher accuracy and better support for other languages. If the Sikuli OCR design can be modular (as defined in blueprint), user should be able to use other OCR.

Other observations in the current OCR

1. The OCR can recognize the text but the click fails.
    If a screen has text "Search" and if I try click("Search") the click returns failure. But when I try to get the text in the screen using the text() api and print the text, it will print all the strings including the string "Search".
    May be I think we need some improvement in searching the string of text returned by OCR.