OCR using cuneiform does not work
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| gscan2pdf (Ubuntu) |
Low
|
Unassigned |
Bug Description
Binary package hint: gscan2pdf
The cuneiform version in ubuntu has no libmagick++-support (Bug #654767). Therefore it can only process uncompressed BMP v3 images.
Trying anything else leads to OCR being cancelled. On the console this is printed:
> /tmp/lD1hIbHasU
> *** unhandled exception in callback:
> *** Error: cannot open /tmp/lD1hIbHasU
> *** ignoring at /usr/bin/gscan2pdf line 12513.
As a workaround gscan2pdf should convert the images before passing them over to cuneiform.
ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: gscan2pdf 0.9.31-2
ProcVersionSign
Uname: Linux 2.6.35-22-generic x86_64
NonfreeKernelMo
Architecture: amd64
Date: Mon Oct 4 20:51:24 2010
InstallationMedia: Kubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100427)
PackageArchitec
ProcEnviron:
LANGUAGE=
LANG=de_DE.utf8
SHELL=/bin/bash
SourcePackage: gscan2pdf
FriedChicken (domlyons) wrote : | #1 |
Jeffrey Ratcliffe (jeffreyratcliffe) wrote : Re: [Bug 654771] Re: Pass images as uncompressed BMP v3 to cuneiform | #2 |
Thank you! Yes, this should work.
FriedChicken (domlyons) wrote : | #4 |
I'm not sure if I should file a new bug or append it to this one...
Now cuneiform is built with libmagick++-support (Bug #654767 fixed for maverick, fix for Lucid is in proposed repository). So cuneiform can perform nearly any image format. But OCR with cuneiform still doesn't work: The OCR tab simply stays clear.
Manually starting cuneiform an a scanned image in /tmp/CrazyFolde
FriedChicken (domlyons) wrote : | #5 |
Replacing the "-f hocr" option for cuneiform by "-f smarttext" solved this. (Instead of "smarttext" "text" should work, too. But smarttext is probably better in most cases.)
Did "-f hocr" work for anyone at all?
tags: | added: patch |
Jeffrey Ratcliffe (jeffreyratcliffe) wrote : Re: [Bug 654771] Re: Pass images as uncompressed BMP v3 to cuneiform | #6 |
I need to do some more work on that patch.
It seems that the hocr output from this version of cuneiform is a box
per letter, which gets the letters in the right place, but is useless
for searching.
Can anyone check cuneiform 1.0.0 to see if it the same there?
Otherwise, I'll probably switch to plain text.
tags: | added: patch-needswork |
Changed in gscan2pdf (Ubuntu): | |
status: | New → Incomplete |
importance: | Undecided → Low |
summary: |
- Pass images as uncompressed BMP v3 to cuneiform + OCR using cuneiform does not work |
It works fine with cuneiform 1.0.0
Launchpad Janitor (janitor) wrote : | #8 |
[Expired for gscan2pdf (Ubuntu) because there has been no activity for 60 days.]
Changed in gscan2pdf (Ubuntu): | |
status: | Incomplete → Expired |
Marja Erwin (marja-e) wrote : | #9 |
I am running into the same bug.
Marja Erwin (marja-e) wrote : | #10 |
I marked this as new, because it wasn't showing up when I searched for gscan2pdf.
Changed in gscan2pdf (Ubuntu): | |
status: | Expired → New |
This is fixed in gscan2pdf 1.0.0
Changed in gscan2pdf (Ubuntu): | |
status: | New → Fix Released |
This patch fixes things for me