Add scan options for text mode scan (lineart or grayscale)

Bug #658792 reported by Robert Ancell
54
This bug affects 11 people
Affects Status Importance Assigned to Milestone
Simple Scan
Triaged
Wishlist
Unassigned
simple-scan (Debian)
Fix Released
Unknown
simple-scan (Ubuntu)
Triaged
Wishlist
Unassigned

Bug Description

Changes in bug #521323 means that text scans are now done in 2 bit grayscale. This was done to improve readability and some scanners seemed to have a very poor lineart mode. It may be useful to have an option to choose what mode text is scanned in (lineart, 2 bit grayscale, 8 bit grayscale).

Revision history for this message
Robert Ancell (robert-ancell) wrote :

Reports indicate the 2 bit grayscale does not work well in OCR engines.

Changed in simple-scan:
status: New → Triaged
importance: Undecided → Wishlist
Revision history for this message
Mark Edgington (edgimar) wrote :

Not sure if it's relevant, but apparently the Eikazo program (also python, using SANE) has a good gray->1-bit algorithm, which might be worth checking out / adapting for use in simple-scan. The algorithm thresholds pixels based on their local context.

(see http://eikazo.berlios.de/doc/postproc.html)

Revision history for this message
Hans Spaans (hspaans) wrote :

For a lot of scans to be done in color is just too much and in black and white gives too little details. If 8 bit grayescale on 150 dpi could be added it would make my day at least.

Revision history for this message
Helmut Fedder (helmut-fedder) wrote :

+1 for a linert option in the settings

Here the application is to scan a form that was printed on recycled paper. Then, the form shall be filled out electronically and be printed afterwards.

1. The readability of the scan is somehow OK, but much better after applying a threshold using gimp.

2. Printing the document gives a very ugly result, with huge amount of grey shading in the background

Bottom line:

You definitely want lineart for scans on recycled paper.

Revision history for this message
Helmut Fedder (helmut-fedder) wrote :

another +1 for a lineart option

ran into the same problem in a different occasion. Wanted to scan a document and send it as a fax using some free fax online service - very useful feature: saves you sending official documents around by snail mail.

However, when the fax service converted the pdf, it came out essentially completely black.

I assume monochrome graphics would solve that problem.

Revision history for this message
Ian (superian) wrote :

Firstly, thank you for this program - it really does make scanning simpler. However... :)

With my old scanner (CanonScan LiDE 20) both text and photo modes work very well. The shift to 2-bit greyscale works well and it does look better than the older 1-bit b/w.

With my new scanner (CanonScan LiDE 210) the combination of the settings, the hardware and the material I am scanning is currently too sensitive: it picks up too much light grey from the page. Altering brightness and contrast settings would be one cure, except that this is not currently an option in Simple Scan, so I have to do it in GIMP or similar.

Looking at the output files, the four level greyscale is currently '0% grey (i.e. white), 33% grey, 67% grey and 100% grey (i.e. black)'.

For what text mode is for, I think it would be better to scan in a higher (four bit?) resolution and to do some simple post-processing so that the greyscale becomes '0%, 50%, 75%, 100%' or even '0%, 60%, 80%, 100%' (i.e. concentrate the output resolution on on the darker bits).

Changed in simple-scan (Ubuntu):
status: New → Triaged
importance: Undecided → Wishlist
Changed in simple-scan (Debian):
status: Unknown → Confirmed
Revision history for this message
Jack Wasey (jackwasey) wrote :

There are many comments about what is "best" for OCR (which is assumed to be the goal of text-mode scanning). Some points:

1. OCR is so terrible in linux, it's hardly worth bothering with OCR tuning
2. Even so, I've read various tips on best OCR scanning method: the majority now seem to say grayscale
3. disk is so cheap, and with the hope of future OCR that actually works, erring on the side of more information would be preferable (that is, 8 bit grayscale)
3b. i would also make the default resolutions higher which also future proofs for better OCR.

If anyone actually wants 1 bit line art, then this could/should be a simple post-processing step if not an alternative option in simple-scan.

Changed in simple-scan (Debian):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.