pdf output file is 6 times bigger than if created by simple-scan

Bug #922162 reported by tranchais
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
gscan2pdf (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

In exactly the same conditions (i.e. one simgle page of text, without picture), with the same scanning options (gray, 150 dpi), the pdf file ouput by gscan2pdf is 437.5 kb when it is only 81.4 kb if done using simple-scan.

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: gscan2pdf 0.9.32-1
ProcVersionSignature: Ubuntu 3.0.0-12.20-generic-pae 3.0.4
Uname: Linux 3.0.0-12-generic-pae i686
ApportVersion: 1.23-0ubuntu3
Architecture: i386
Date: Thu Jan 26 16:48:22 2012
InstallationMedia: Ubuntu 11.10 "Oneiric" - Build i386 LIVE Binary 20111013-11:02
PackageArchitecture: all
ProcEnviron:
 PATH=(custom, user)
 LANG=fr_FR.UTF-8
 SHELL=/bin/bash
SourcePackage: gscan2pdf
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
tranchais (pmathe) wrote :
Revision history for this message
Jeffrey Ratcliffe (jeffreyratcliffe) wrote : Re: [Bug 922162] Re: pdf output file is 6 times bigger than if created by simple-scan

This depends heavily on the compression options you chose when saving
the PDF, and whether you embedded OCR output. Please start with

gscan2pdf --log=log

save a PDF, close gscan2pdf, and post the log file.

Revision history for this message
tranchais (pmathe) wrote :

Le 26/01/2012 17:36, Jeffrey Ratcliffe a écrit :
> This depends heavily on the compression options you chose when saving
> the PDF, and whether you embedded OCR output. Please start with
>
> gscan2pdf --log=log
>
> save a PDF, close gscan2pdf, and post the log file.
>
Thank you for your quick answer.
In my scanning options I use neither ocr output nor file cleaning (I use
a french user interface, it might be scan cleaning instead of file cleaning)
You will find as attached files:
1. the log output (log_gscan.txt)
2. the file as output by gscan2pdf (test_gscan2pdf_1_500-12.pdf) : 500
means white threshold=0.500 and 12 black threshold=0.12
3. the file as output by simple-scan (test_simplescan_1.pdf

You will notice that, in additionto the file size problem, the quality
of the output file is much better with simple-scan.

When the same document sheet is scanned with the default threshold
values (white=0.005 and black=0.12) the file size goes down to 317,2 kb,
which is still 4 times the size of simple-scan, with no visible quality
improvement.

Revision history for this message
Jeffrey Ratcliffe (jeffreyratcliffe) wrote :

As I said, the file size depends heavily on the compression.

You are using JPG compression, which is good for photo-type images, but not for scans of text.

The image is 8-bit, when 1-bit would have done.

You say that you are using thresholding, but the log file doesn't show this.

Try scanning with scan mode = lineart, and save the PDF with compression = auto.

Revision history for this message
tranchais (pmathe) wrote :

Le 30/01/2012 08:01, Jeffrey Ratcliffe a écrit :
> As I said, the file size depends heavily on the compression.
>
> You are using JPG compression, which is good for photo-type images, but
> not for scans of text.
>
> The image is 8-bit, when 1-bit would have done.
>
> You say that you are using thresholding, but the log file doesn't show
> this.
Yes, it does, on lines 76 and 80
>
> Try scanning with scan mode = lineart, and save the PDF with compression
> = auto.
>
Ok, it works fines this way, thank you for your help. It would be nice
to document that somewhere , because in my opinion, many scan are done
just for plain text documents.
I am really pleased with your help, fast and efficient, and with this
program.
Cheers

Revision history for this message
Jeffrey Ratcliffe (jeffreyratcliffe) wrote :

On 30 January 2012 10:58, tranchais <email address hidden> wrote:
>> You say that you are using thresholding, but the log file doesn't show
>> this.

> Yes, it does, on lines 76 and 80

No. Those just tell you the default settings, should you use the
threshold tool. To actually use it, you have to go to Tools/Threshold.

Changed in gscan2pdf (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.