Ubuntu
tesseract package

Installing tesseract-ocr should also install tesseract-ocr-eng

Bug #224264 reported by Yesudeep J Mangalapilly on 2008-04-29

This bug affects 9 people

Affects		Status	Importance	Assigned to	Milestone
	tesseract (Debian)	Fix Released	Unknown	debbugs #558254
	tesseract (Ubuntu)	Fix Released	Undecided	Unassigned

Bug Description

Problem Description:
---------------------------
$ tesseract foo.tiff foo.text
Unable to load unicharset file /usr/share/tesseract-ocr/tessdata/eng.unicharset

When tesseract is called without specifying the language parameter, it defaults to using English.
The tesseract-ocr package does not install the English language data by default, which causes
tesseract-ocr to output this error message.

The file in question does not exist at this particular location.

Suggested Solution:
--------------------------
The tesseract-ocr package should include tesseract-ocr-eng as a dependency.

Tags:

David Cordero (dcordero-deactivatedaccount-merged) on 2008-04-29

Changed in tesseract:
assignee:	nobody → dcordero
status:	New → In Progress

David Cordero (dcordero-deactivatedaccount-merged) on 2008-07-07

Changed in tesseract:
assignee:	dcordero → nobody
status:	In Progress → Confirmed

Revision history for this message

komputes (komputes) wrote on 2008-10-01:

I am confirming having this bug as well. It seems that the English unicharset was not included in the package.

I am using Ubuntu 8.04.1 and tesseract-ocr 2.01-3. The workaround is to install the package manually. Open a terminal and run:
$ sudo apt-get install tesseract-ocr-eng

I found that you should use a high quality image when converting to text through OCR or you are likely to run into spelling errors. Please make english part of the default package (instead of German) or make it a dependency when packaging.

Revision history for this message

CSkau (clementskau-gmail) wrote on 2009-10-17:

This is also a problem on Ubuntu 9.10 beta (fully updated 2009-10-18)

Bug Watch Updater (bug-watch-updater) on 2009-11-28

Changed in tesseract (Debian):
status:	Unknown → New

Revision history for this message

SabreWolfy (sabrewolfy) wrote on 2009-12-06:

Confirmed in fully patched Karmic.

Revision history for this message

SabreWolfy (sabrewolfy) wrote on 2009-12-06:

Also, the filename extension MUST be "tif", not "tiff".

Bug Watch Updater (bug-watch-updater) on 2010-05-20

Changed in tesseract (Debian):
status:	New → Fix Released

Bug Watch Updater (bug-watch-updater) on 2010-10-12

Changed in tesseract (Debian):
status:	Fix Released → New

Revision history for this message

Damiön la Bagh (kat-amsterdam) wrote on 2010-12-03:

This also happens with the Dutch version

kat@tab:~/Bureaublad$ tesseract DOC178.tif sint.txt -l nl
Unable to load unicharset file /usr/share/tesseract-ocr/tessdata/nl.unicharset of tesseract.

Revision history for this message

neuromancer (neuromancer) wrote on 2011-02-25:

In Ubuntu 10.10 maverick meerkat tesseract-eng package is correctly installed installing tesseract so this bug is FIXED.

@Kat Amsterdam : I think that your problem is a different one and not related to this bug.
Howewer try to see if in the directory reported by your problem there is the right file.
cd /usr/share/tesseract-ocr/tessdata/
ls -al

In my case launching tesseract file.tif file.txt -l it
give me same error
So I've checked if it.unicharset was present in /usr/share/tesseract-ocr/tessdata/ folder and I've found that this file is named ita.unicharset.
Launching tesseract file.tif file.txt -l ita works great :)

Changed in tesseract (Ubuntu):
status:	Confirmed → Fix Released

Bug Watch Updater (bug-watch-updater) on 2012-02-02

Changed in tesseract (Debian):
status:	New → Fix Committed

Bug Watch Updater (bug-watch-updater) on 2012-02-03

Changed in tesseract (Debian):
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

debbugs #558254
[done normal] Edit

Bug watches keep track of this bug in other bug trackers.

Ubuntutesseract package

Installing tesseract-ocr should also install tesseract-ocr-eng

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
tesseract package