Installing tesseract-ocr should also install tesseract-ocr-eng
Bug #224264 reported by
Yesudeep J Mangalapilly
This bug affects 9 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tesseract (Debian) |
Fix Released
|
Unknown
|
|||
tesseract (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Problem Description:
-------
$ tesseract foo.tiff foo.text
Unable to load unicharset file /usr/share/
When tesseract is called without specifying the language parameter, it defaults to using English.
The tesseract-ocr package does not install the English language data by default, which causes
tesseract-ocr to output this error message.
The file in question does not exist at this particular location.
Suggested Solution:
-------
The tesseract-ocr package should include tesseract-ocr-eng as a dependency.
Changed in tesseract: | |
assignee: | nobody → dcordero |
status: | New → In Progress |
Changed in tesseract: | |
assignee: | dcordero → nobody |
status: | In Progress → Confirmed |
Changed in tesseract (Debian): | |
status: | Unknown → New |
Changed in tesseract (Debian): | |
status: | New → Fix Released |
Changed in tesseract (Debian): | |
status: | Fix Released → New |
Changed in tesseract (Debian): | |
status: | New → Fix Committed |
Changed in tesseract (Debian): | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
I am confirming having this bug as well. It seems that the English unicharset was not included in the package.
I am using Ubuntu 8.04.1 and tesseract-ocr 2.01-3. The workaround is to install the package manually. Open a terminal and run:
$ sudo apt-get install tesseract-ocr-eng
I found that you should use a high quality image when converting to text through OCR or you are likely to run into spelling errors. Please make english part of the default package (instead of German) or make it a dependency when packaging.