Ubuntu
gscan2pdf package

OCR using cuneiform does not work

Bug #654771 reported by FriedChicken on 2010-10-04

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	gscan2pdf (Ubuntu)	Fix Released	Low	Unassigned

Bug Description

Binary package hint: gscan2pdf

The cuneiform version in ubuntu has no libmagick++-support (Bug #654767). Therefore it can only process uncompressed BMP v3 images.

Trying anything else leads to OCR being cancelled. On the console this is printed:
> /tmp/lD1hIbHasU/ThgzjU3Pqw.pnm is not a BMP file.
> *** unhandled exception in callback:
> *** Error: cannot open /tmp/lD1hIbHasU/DWr2n3tweG.txt
> *** ignoring at /usr/bin/gscan2pdf line 12513.

As a workaround gscan2pdf should convert the images before passing them over to cuneiform.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: gscan2pdf 0.9.31-2
ProcVersionSignature: Ubuntu 2.6.35-22.33-generic 2.6.35.4
Uname: Linux 2.6.35-22-generic x86_64
NonfreeKernelModules: fglrx
Architecture: amd64
Date: Mon Oct 4 20:51:24 2010
InstallationMedia: Kubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100427)
PackageArchitecture: all
ProcEnviron:
LANGUAGE=
LANG=de_DE.utf8
SHELL=/bin/bash
SourcePackage: gscan2pdf

Tags:

Revision history for this message

FriedChicken (domlyons) wrote on 2010-10-04:

Dependencies.txt Edit (4.5 KiB, text/plain; charset="utf-8")

Revision history for this message

Jeffrey Ratcliffe (jeffreyratcliffe) wrote on 2010-10-04: Re: [Bug 654771] Re: Pass images as uncompressed BMP v3 to cuneiform

0001-Support-cuneiform-better-by-converting-first-to-bmp-.patch Edit (1.6 KiB, text/x-patch; charset=US-ASCII; name="0001-Support-cuneiform-better-by-converting-first-to-bmp-.patch")

This patch fixes things for me

Revision history for this message

FriedChicken (domlyons) wrote on 2010-10-04: Re: Pass images as uncompressed BMP v3 to cuneiform

Thank you! Yes, this should work.

Revision history for this message

FriedChicken (domlyons) wrote on 2010-10-06:

I'm not sure if I should file a new bug or append it to this one...

Now cuneiform is built with libmagick++-support (Bug #654767 fixed for maverick, fix for Lucid is in proposed repository). So cuneiform can perform nearly any image format. But OCR with cuneiform still doesn't work: The OCR tab simply stays clear.

Manually starting cuneiform an a scanned image in /tmp/CrazyFolderName/RandomImageName.pnm works and shows a pretty exact result. So it's not a problem of cuneiform or a unusable scanned document.

Revision history for this message

FriedChicken (domlyons) wrote on 2010-10-06:

Replacing the "-f hocr" option for cuneiform by "-f smarttext" solved this. (Instead of "smarttext" "text" should work, too. But smarttext is probably better in most cases.)

Did "-f hocr" work for anyone at all?

Brian Murray (brian-murray) on 2010-10-07

tags:

added: patch

Revision history for this message

Jeffrey Ratcliffe (jeffreyratcliffe) wrote on 2010-10-07: Re: [Bug 654771] Re: Pass images as uncompressed BMP v3 to cuneiform

I need to do some more work on that patch.

It seems that the hocr output from this version of cuneiform is a box
per letter, which gets the letters in the right place, but is useless
for searching.

Can anyone check cuneiform 1.0.0 to see if it the same there?
Otherwise, I'll probably switch to plain text.

Daniel Holbach (dholbach) on 2011-03-10

tags:

added: patch-needswork

Daniel T Chen (crimsun) on 2011-07-28

Changed in gscan2pdf (Ubuntu):
status:	New → Incomplete
importance:	Undecided → Low
summary:	- Pass images as uncompressed BMP v3 to cuneiform + OCR using cuneiform does not work

Revision history for this message

Jeffrey Ratcliffe (jeffreyratcliffe) wrote on 2011-07-28:

It works fine with cuneiform 1.0.0

Revision history for this message

Launchpad Janitor (janitor) wrote on 2011-09-27:

[Expired for gscan2pdf (Ubuntu) because there has been no activity for 60 days.]

Changed in gscan2pdf (Ubuntu):
status:	Incomplete → Expired

Revision history for this message

Marja Erwin (marja-e) wrote on 2011-10-03:

I am running into the same bug.

Revision history for this message

Marja Erwin (marja-e) wrote on 2011-10-03:

#10

I marked this as new, because it wasn't showing up when I searched for gscan2pdf.

Changed in gscan2pdf (Ubuntu):
status:	Expired → New

Revision history for this message

Jeffrey Ratcliffe (jeffreyratcliffe) wrote on 2011-10-03:

#11

This is fixed in gscan2pdf 1.0.0

Changed in gscan2pdf (Ubuntu):
status:	New → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Patches

0001-Support-cuneiform-better-by-converting-first-to-bmp-.patch Edit

Add patch

Bug attachments

Dependencies.txt Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntugscan2pdf package

OCR using cuneiform does not work

Bug Description

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Ubuntu
gscan2pdf package