OCR using cuneiform does not work

Bug #654771 reported by FriedChicken on 2010-10-04
This bug affects 2 people
Affects Status Importance Assigned to Milestone
gscan2pdf (Ubuntu)

Bug Description

Binary package hint: gscan2pdf

The cuneiform version in ubuntu has no libmagick++-support (Bug #654767). Therefore it can only process uncompressed BMP v3 images.

Trying anything else leads to OCR being cancelled. On the console this is printed:
> /tmp/lD1hIbHasU/ThgzjU3Pqw.pnm is not a BMP file.
> *** unhandled exception in callback:
> *** Error: cannot open /tmp/lD1hIbHasU/DWr2n3tweG.txt
> *** ignoring at /usr/bin/gscan2pdf line 12513.

As a workaround gscan2pdf should convert the images before passing them over to cuneiform.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: gscan2pdf 0.9.31-2
ProcVersionSignature: Ubuntu 2.6.35-22.33-generic
Uname: Linux 2.6.35-22-generic x86_64
NonfreeKernelModules: fglrx
Architecture: amd64
Date: Mon Oct 4 20:51:24 2010
InstallationMedia: Kubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100427)
PackageArchitecture: all
SourcePackage: gscan2pdf

FriedChicken (domlyons) wrote :

This patch fixes things for me

Thank you! Yes, this should work.

FriedChicken (domlyons) wrote :

I'm not sure if I should file a new bug or append it to this one...

Now cuneiform is built with libmagick++-support (Bug #654767 fixed for maverick, fix for Lucid is in proposed repository). So cuneiform can perform nearly any image format. But OCR with cuneiform still doesn't work: The OCR tab simply stays clear.

Manually starting cuneiform an a scanned image in /tmp/CrazyFolderName/RandomImageName.pnm works and shows a pretty exact result. So it's not a problem of cuneiform or a unusable scanned document.

FriedChicken (domlyons) wrote :

Replacing the "-f hocr" option for cuneiform by "-f smarttext" solved this. (Instead of "smarttext" "text" should work, too. But smarttext is probably better in most cases.)

Did "-f hocr" work for anyone at all?

tags: added: patch

I need to do some more work on that patch.

It seems that the hocr output from this version of cuneiform is a box
per letter, which gets the letters in the right place, but is useless
for searching.

Can anyone check cuneiform 1.0.0 to see if it the same there?
Otherwise, I'll probably switch to plain text.

tags: added: patch-needswork
Daniel T Chen (crimsun) on 2011-07-28
Changed in gscan2pdf (Ubuntu):
status: New → Incomplete
importance: Undecided → Low
summary: - Pass images as uncompressed BMP v3 to cuneiform
+ OCR using cuneiform does not work

It works fine with cuneiform 1.0.0

Launchpad Janitor (janitor) wrote :

[Expired for gscan2pdf (Ubuntu) because there has been no activity for 60 days.]

Changed in gscan2pdf (Ubuntu):
status: Incomplete → Expired
Marja Erwin (marja-e) wrote :

I am running into the same bug.

Marja Erwin (marja-e) wrote :

I marked this as new, because it wasn't showing up when I searched for gscan2pdf.

Changed in gscan2pdf (Ubuntu):
status: Expired → New

This is fixed in gscan2pdf 1.0.0

Changed in gscan2pdf (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers