lzma more efficient than gzip

Bug #446245 reported by Jérôme
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
foomatic-db (Ubuntu)
Fix Released
Wishlist
Unassigned
Declined for Lucid by Till Kamppeter

Bug Description

Binary package hint: foomatic-db

The /usr/share/ppd directory uses a lot of disk space.

All the ppd files are currently compressed by using gzip.

I think that using lzma would decrease the disk space used as well as improve the decompression speed.

Below is a comparaison for the file /usr/share/ppd/openprinting/KONICA_MINOLTA/KO1050UX.ppd.gz:
-----
j@j-desktop:~$ du KO1050UX.ppd*
256 KO1050UX.ppd
24 KO1050UX.ppd.gz
16 KO1050UX.ppd.lzma
j@j-desktop:~$ time { for i in {0..999} ; do unlzma -c KO1050UX.ppd.lzma >/dev/null ; done }

real 0m18.765s
user 0m7.060s
sys 0m11.560s
j@j-desktop:~$ time { for i in {0..999} ; do gunzip -c KO1050UX.ppd.gz >/dev/null ; done }

real 0m23.183s
user 0m6.840s
sys 0m16.170s
j@j-desktop:~$
-----

ProblemType: Bug
Architecture: amd64
CupsErrorLog:
 E [08/Oct/2009:11:27:00 +0200] Unable to remove temporary file "/var/spool/cups/tmp/.hplip" - Is a directory
 E [08/Oct/2009:12:32:08 +0200] Unable to remove temporary file "/var/spool/cups/tmp/.hplip" - Is a directory
CurrentDmesg:
 [ 18.390080] Clocksource tsc unstable (delta = -449463366 ns)
 [ 18.920203] eth0: no IPv6 routers present
 [ 764.424483] oosplash.bin[1546]: segfault at 7f7e88ef9ae8 ip 0000000000403db1 sp 00007fff38619400 error 6 in oosplash.bin[400000+7000]
Date: Thu Oct 8 13:04:08 2009
DistroRelease: Ubuntu 9.10
Lpstat: Error: command ['lpstat', '-v'] failed with exit code 1: lpstat: No destinations added.
Lsusb: Error: command ['lsusb'] failed with exit code 1:
Package: openprinting-ppds 20090825-0ubuntu3
PackageArchitecture: all
Papersize: a4
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-12-generic root=UUID=a4218aa5-7e30-45af-85dd-f9c3b1ce603a ro quiet splash
ProcEnviron:
 LANG=fr_FR.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-12.41-generic
SourcePackage: foomatic-db
Uname: Linux 2.6.31-12-generic x86_64
XsessionErrors:
 (gnome-settings-daemon:1182): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (gnome-settings-daemon:1182): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (nautilus:1227): Eel-CRITICAL **: eel_preferences_get_boolean: assertion `preferences_is_initialized ()' failed
 (polkit-gnome-authentication-agent-1:1236): GLib-CRITICAL **: g_once_init_leave: assertion `initialization_value != 0' failed
dmi.bios.date: 01/01/2007
dmi.bios.vendor: QEMU
dmi.bios.version: QEMU
dmi.chassis.type: 1
dmi.modalias: dmi:bvnQEMU:bvrQEMU:bd01/01/2007:svn:pn:pvr:cvn:ct1:cvr:

Revision history for this message
Jérôme (jerome-bouat) wrote :
Changed in foomatic-db (Ubuntu):
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

First, this is a feature request and therefore I have marked it "Wishlist". As this feature request is not about the packaging and installation of foomatic-db but about its core functionality I recommend to report this upstream, in this case on http://buga.linuxfoundation.org/, for the product "OpenPrinting" and the component "foomatic-db".

The reason why we are using gzip compression for all PPDs (also in the printer driver packages) and not bz2 or lzma is that CUPS reads ready-made PPD files also if they are .gz-compressed but not if they are .bz2- or .lzma-compressed. Here you should post a feature request at http://www.cups.org/str.php asking for CUPS to support more modern compression formats for PPD files.

Another approach which could be implemented in foomatic-db-engine is to supply a PPD generator in /usr/lib/cups/driver/ which simply uncompresses lzma-compressed PPDs when CUPS requests them. In this case one can even think about putting all PPDs into one big file (like .tar.lzma) as this compresses the files even further. When compressing all PPDs into one file one can very well make use of the similarity of PPDs for printers which are only slightly different (something like HP LaserJet 4100 and 4200 or so). One could divide the PPDs in groups of similar files and in ech group take a master PPD which gets included completely and add the rest as diffs to be applied to the master PPD. Naturally one completes it by lzma-ing the whole thing.

Algorithms to genrate such a PPD package and quickly getting the desirted PPDs out of it would be an interesting student project on custom data compression techmologies.

Revision history for this message
Jérôme (jerome-bouat) wrote :

Thanks for your feedback.

I posted a feature request for lzma compression to the cups project:
http://www.cups.org/str.php?L3369

For the PPD generator solution, I think that it is a kind of file system implementation with less performance since the PPD generator would have to read the whole tar archive in order to reach a driver file at the end of the archive.

Moreover, it would make the cups system heavier and duplicate the file system feature.

Revision history for this message
Jérôme (jerome-bouat) wrote :

Mike from Cups team replied to my request that "LZMA as a format has also not been standardized like GZIP/Flate, so we'd want to make sure that issue is covered before we adopt it."

Revision history for this message
Jérôme (jerome-bouat) wrote :

Maybe a driver cache would avoid cups to decompress each time.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

My suggestion of comment #2 is now implemented. I have mentored Vitor Baptista in the Google Summer of Code 2010 exactly for this. He finished it today and I applied it to the openprinting-ppds and openprinting-ppds-extra packages in Maverick, saving near 30 MB in an installed system. I will do the same with HPLIP.

Changed in foomatic-db (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Jérôme (jerome-bouat) wrote :

Great. Thank Vitor and Till for your effort.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.