[SOLVED] postscript printer hideously slow in some cases (pdftops)

Bug #1476705 reported by bruno
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
poppler (Ubuntu)
Confirmed
Undecided
Unassigned
system-config-printer (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

With my (old) postscript printer, print a single page can take many minutes on some situations. It happens with some PDF files (not all) and Firefox printing of Google map, for example. When this happens, I observed in system monitor that pdftops is running continuously. After some manual PDF -> PS conversions, I see that pdftops inflates the file size for problematic cases, but is ok for other files (size similar to the original file, or even smaller). I don't know if modern Postscript printers can handle this quickly, but it's unacceptable here and certainly not an efficient way to print those files.

So I suspect that pdftops should be fixed.

For example, I join a problematic pdf produced by Google Map in Firefox. I tried many conversions. As you can see, I get a much larger file (36 times) with pdftops. It is worse with pdf2ps (and it takes longer to process), so replace pdftops by pdf2ps is not an option for me. However, pdftocairo quickly produces an efficient file. I have the same success if I open the PDF file with Evince and print it as a Postscript file. I get a similar file if I print to PS directly from Google Map (Firefox). Of course these small PS files produced by pdftocairo, Evince of Firefox print flawlessly on my printer.

See also Bug # 1095498 which I suspect is the same (old) thing, but I fill a new one since it doesn't seem to be printer specific.
Of course, another workaround could be to use a PCL driver but no one is available for my printer (HP Color Laserjet 2605dn).

Revision history for this message
bruno (brunob) wrote :
Revision history for this message
bruno (brunob) wrote :
affects: system-config-printer (Ubuntu) → poppler (Ubuntu)
Revision history for this message
bruno (brunob) wrote :

Ubuntu 14.04 (Linux Mint 17.2)
poppler-utils 0.24.5-2ubuntu4.2
system-config-printer-gnome 1.4.3+20140219-0ubuntu2.6
system-config-printer-common 1.4.3+20140219-0ubuntu2.6
system-config-printer-udev 1.4.3+20140219-0ubuntu2.6
HP Color Laserjet 2605dn configured with the HP Color Laserjet 2605 Postscript PPD

description: updated
Revision history for this message
bruno (brunob) wrote :
Revision history for this message
bruno (brunob) wrote :
Revision history for this message
bruno (brunob) wrote :
Revision history for this message
bruno (brunob) wrote :
Revision history for this message
bruno (brunob) wrote :

I managed how to switch to pdftocairo:

$ lpadmin -p HP-Color-Laserjet-2605dn -o pdftops-renderer-default=pdftocairo

I know pdf2cairo is now called instead pdftops (system monitor), but it still takes minutes to print a really simple page (no gain).

I'm lost. It seems another processing occurs after pdf2cairo, so maybe the "big PS file" is not the only issue...

Revision history for this message
bruno (brunob) wrote :
Revision history for this message
bruno (brunob) wrote :

I can't join "Getting the data which would go to the printer" since it's a 303 Mb file !!!
Note that I get the same file size if I return to pdftops with lpadmin

The trouble seems there! 303 Mb is a non-sense, it should be 1 Mb!

Also I note I have pstops process running during this test printout. However pstops (or psutils) is not installed on my system...

Revision history for this message
bruno (brunob) wrote :

Correction: I get 303 Mb output (data which would go to the printer) with pdftops (the default) on my test queue. If I switch to pdftocairo, I get a 2 Mb file output and the simple PDF prints much quickly.

I have to do more tests since THIS sample file prints faster with pdftocairo but others don't...

It seems that passing the LanguageLevel (in PPD) to "2" also helps.

bruno (brunob)
description: updated
Revision history for this message
bruno (brunob) wrote :

In summary:
=========
The sample pdf file of this bug prints well if pdftops is replaced by pdftocairo. I still find that printing is slow for my Linux Mint test page. Passing the LanguageLevel to "2" in PPD seems finally to be worse, at least for this test page. Maybe I underestimate the complexity of this test page, but it prints quickly with a generic PCL6 Guntenprint PPD.

Time will tell if pdftocairo is an "acceptable" fix, since I have no official PCL6 alternative (for this printer).

Revision history for this message
bruno (brunob) wrote :

I join some files (appout and printout) to help debug the slow test page. This is with pdftocairo. I have confirmed that a direct pdftocairo conversion produces a file about the same size as the original, but 10 times smaller than pdftops. So again, pdftocairo is better.

Comparing to the first sample pdf (google maps), we can see here that even if pdftocairo itself is efficient, further processing inflates this file (printout 21 times bigger). This may explain why this test page prints slowly, even with pdftocairo.

Now, how can we reduce/eliminate this further processing? I mean, when the file is in PS format, it is ready to ship...

Revision history for this message
bruno (brunob) wrote :
Revision history for this message
cliddell (cjl) wrote :
Download full text (4.4 KiB)

The problem, I suspect, is the way Cairo rights PDF files (the gmap.pdf file you attached above was created by Cairo). The Cairo *always* writes the page contents into one or more PDF transparency groups - even when all the contents are really opaque.

The issue is that, due to the way PDF works (and PDF transparency in particular) the only way to be sure that all the page contents are opaque would be to pre-process each page, checking for non-opaque content, and then re-interpret the page using the information gleaned in the first pass - which would, frankly, result in an unacceptable performance drop for the vast majority of PDF files.

Most interpreters I know will pre-scan the quickly accessible elements of a PDF page, and if no transparency constructs are found, will then elide the extra processing transparency requires. Unfortunately, those easily accessible elements don't contain (or, at least, don't reliably contain) the actual opacity information. So, in most cases I know, just the existence of the transparency constructs means that extra processing is enabled, regardless of the actual opacity values.

Now, secondly, Postscript cannot represent PDF transparency in high level (vector etc) operations. So, the only way to get a visually accurate representation of a PDF containing transparency in Postscript is to "flatten" the transparency by rendering it to a sampled image - and clearly, sampled images end up being larger than vector graphics.

Hence we have the result that basically every Cairo produced PDF will convert to Postscript as one or more sampled images per page.

And that explains why the Postscript is so much larger than the PDF.

Now, looking at the Postscript file you posted, it *appears* that the rendering for transparency flattening is being done at 1200dpi which is, frankly, ridiculous for a couple of reasons. First is, your printer has a maximum physical resolution of 600dpi (the ImageRET modes provide enhanced quality, claimed to be equivalent to 2400 and, IIRC, 3600 dpi, but the printer is still a 600 dpi printer). Secondly, our experience with Ghostscript's Postscript output, is that many printers are much, *much* faster at upsampling images than downsampling.

So, my first suggestion would be to poke around the CUPS dialogues and/or the PPD, and see if you can drop the claimed resolution of the printer to at most 600dpi and, frankly, I'd even try 300 dpi. As a rule of thumb, in the printing world, it's generally claimed that dropping continuous tone, sampled image resolution by 50% from the physical resolution results in almost no visible loss of quality. Where that falls down, in cases like this, is because there is text involved, and the small details inherent in text shapes may well suffer visibly.

Another thing we've found with the Postscript output from Ghostscript is that many printers are very, very slow at decompressing data, so if you can find an option to avoid compressing image data, that *might* make a difference - but that is highly printer dependent.

I'd like to take this opportunity to rant (again): this kind of thing is the reason that PDF is such poor, poor choice as a print spool format. PDF ha...

Read more...

Revision history for this message
bruno (brunob) wrote :

@cliddell Wow, thank you for all these informations.

As I said before, my first sample (gmap.pdf) prints quickly now with pdftocairo. For this document, it seems that few extra processing is done.

However, the second sample (see appout of my printer test page, also created by Cairo) was still slow to print, even with pdftocairo (with much extra processing, as you describe). If I choose (as you suggested) 600 dpi instead ProRes 1200, this sheet is printed in less than a minute. It's not "quick" but acceptable now for my home need. I see some differences on the intensity of the logo and dashed lines, but overall the quality appears to be good enough.

So I now set this printer in 600 dpi by default.

I continue to observe that pdftocairo is quicker (big difference for gmap.pdf), so I conserve this setting also. Is there any drawback to this? If not, why not set pdftocairo as the default renderer, for postscript printers?

Revision history for this message
cliddell (cjl) wrote :

Bruno,

To be honest, I don't know what differs between the two tools, as they both come from Cairo, I assumed that pdftops was just a small wrapper around the same code as pdftocairo but with the options pre-set for PS output.

I'm a Ghostscript developer, so I can't really answer specifics about Cairo - I know about the problems with the Cairo PDF output, as we've performance problems in Ghostscript with those, and have had some fairly lengthy (and heated) discussions with Cairo developers on the subject.

And I have helped debug a lot of these problems with the Ghostscript output, so I can give general suggestions as I did above.

If I had to guess, I would say that pdftocairo is possibly spotting that the PDF originated as a Cairo file, and is using "inside" knowledge of how those are constructed to convert it back into Cairo internal representation, which is then outputs to Postscript - with that level of extra information, it can probably be much, much smarter about when there is real transparency that it has to render, and when everything opaque, and remain in high level form. Whilst, pstops may be doing a simpler, one step PDF to Postscript conversion.

Ideally, what you'd want to try is (if possible) to keep the "ProRes" (I thought it was "ImageRET") mode, but still tell pdftocairo to use 600 dpi, as you then may get the benefits of more the accurate dot placement, better halftone results, and possibly better color management, whilst keeping the quicker processing of the smaller image data.

It's hard to know without deep inside knowledge, but (again) if I had to guess, I would suggest that the slightly lower quality halftone screen is what's causing the slight intensity shift you mention. HP are pretty tight lipped about these technologies, but I know other such systems tend to allow the halftoning to represent more shades of the color, without losing detail (generally there is a trade off: you can approximate lots of shades, but lose detail, or have great detail, but very few shades).

Chris

Revision history for this message
James Cloos (launchpad-jhcloos) wrote :

pdftops and pdftocairo are both from poppler.

pdftops is based on the original xpdf code and does not use cairo at all.

modern cairo tries to limit the size of tranparency groups to just where it is needed.

Revision history for this message
bruno (brunob) wrote :

@James Cloos: So pdftocairo seems a better choice, no?

@cliddell: I tried to edit the PDD to force hardware 600 DPI for ProRes 1200 and it produces the same output as 600 DPI (less intensity). All my tests are in monochrome. I rarely prints in colors, but my ImageRET 2400 setting is already in 600 DPI, as I understand it. I dont'k how to force 600 DPI anywhere else but it may be a good idea. Thank you for your help.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

You do not need to edit the PPD to limit rendering resolution. Keep the PPD in its original state and also choose the printer-internal resolution (by the appropriate PPD option) to your liking. The set the maximum resolution used for bitmap operations in the pdftops filter via the "pdftops-max-image-resolution" option:

lpadmin -p printer -o pdftops-max-image-resolution-default=600

You can also choose 300 or whatever is most suitable. See also the README file of cups-filters, section "POSTSCRIPT PRINTING RENDERER AND RESOLUTION SELECTION".

Revision history for this message
bruno (brunob) wrote :

Thank you Till to point me this command (and the documentation).

It appears that pdftocairo default is 600, but pdftops default is bigger. This seems to be the reason why I gain so much when I swtich to pdftocairo. Now, if I choose pdftops-max-image-resolution-default=300 (with ProRes 1200 option), I have the best intensity/quality output and an acceptable speed. With these settings, pdftops and pdftocairo have much less difference in output file size, even if pdftocairo is still a bit better (gs in between, but slower in reality).

May I suggest to revise pdftops-max-image-resolution-default, particulary if pdftops is retained as the default renderer.

Even if README file of cups-filters considers pdftocairo as "experimental", I will work with this for a while and time will tell. If necessary, I will return to pdftops or maybe try acroread.

Thank you all.

summary: - postscript printer hideously slow in some cases (pdftops)
+ [SOLVED] postscript printer hideously slow in some cases (pdftops)
Revision history for this message
James Cloos (launchpad-jhcloos) wrote :

pdftocairo is limited by the fact that cairo only handles sRGB.

It is therefore unsuitable any time color is important, eg if the pdf uses cmyk or named colors or icc profiles, et cetera.

It is fine for printing most web pages, since the browser likely also is limited to sRGB.

And for simple personal or office documents, all momochrome document and the like.

But pdftops is better for more general documents, and therefore makes better sence for the default filter.

Revision history for this message
bruno (brunob) wrote :

@James Cloos Thank you for this precision. It makes more sense now.

So, if I understand well, pdftocairo would be a bad choice to print (in colors) a document with a jpeg photo or a complex Gimp picture. Am I right?

Revision history for this message
James Cloos (launchpad-jhcloos) wrote :

Gimp is also, for now, limited to srgb, so anything coming from gimp will work fine with cairo.

And, for a postscript printer, jpegs should be converted directly to ps, not via pdf, so that wouldn't matter either.

But if you used a workflow which used, say, AdobeRGB as the colour working space, and published to pdf, then you would want to avoid cairo.

Or if you used a workflow which ends up with a CMYK pdf.

But if everything you do is in sRGB or grayscale, then pdftocairo is fine for you.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in poppler (Ubuntu):
status: New → Confirmed
Changed in system-config-printer (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.