[Gutsy SRU request] CUPS fails to start on boot

Bug #183652 reported by M. Spencer Reed
26
Affects Status Importance Assigned to Milestone
avahi (Ubuntu)
Fix Released
High
Loïc Minier
Gutsy
Fix Released
High
Unassigned
Hardy
Fix Released
High
Loïc Minier
cupsys (Ubuntu)
Invalid
Medium
Unassigned
Gutsy
Invalid
Undecided
Unassigned
Hardy
Invalid
Medium
Unassigned

Bug Description

After the most recent set of updates, CUPS fails to start on boot if the "Publish shared printers" checkbox is checked. A /etc/init.d/cupsys restart works just fine, however.

TEST CASE:
1) Reboot your system and do

1a) ps uww -C cupsd
  It should confirm that cupsd is running
1b) sudo apt-get install avahi-utils
avahi-browse -k -t -v -r -a
Does the output contain IPP printer entries representing the queues of your local CUPS daemon?

2) The boot problem may not appear on some machines at startup :
you should probably trigger it with
sudo nice -n 19 sh -c '/etc/init.d/cupsys stop && /etc/init.d/avahi-daemon stop && sleep 5 ; /etc/init.d/avahi-daemon start ; /etc/init.d/cupsys start ; sleep 3 ; ps uww -C cupsd'
that should confirm on the last line that cupsd is running

Revision history for this message
M. Spencer Reed (spenzer4hire) wrote :

BTW, I have an i386 install and an AMD 64 install. This bug only shows up on my i386 box. (Both Ubuntu 7.10, fully updated)

Revision history for this message
M. Spencer Reed (spenzer4hire) wrote :

When the "Publish shared printers" checkbox is unchecked I get no errors in the log.

When the checkbox is checked I get these errors in my log on boot:

E [16/Jan/2008:17:37:41 -0600] cupsdAuthorize: Local authentication certificate not found!
E [16/Jan/2008:17:37:41 -0600] cupsdAuthorize: Local authentication certificate not found!
E [16/Jan/2008:17:37:41 -0600] cupsdAuthorize: Local authentication certificate not found!
E [16/Jan/2008:17:37:41 -0600] cupsdAuthorize: Local authentication certificate not found!
E [16/Jan/2008:17:37:52 -0600] DNSServiceRegister failed with error -65537

But when I run /etc/init.d/cupsys restart while the checkbox is checked I get no errors in the log and CUPS works fine.

Makes me wonder if CUPS is starting before another critical service or something.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Is ir possible that CUPS starts too early after the start of Avahi? Or better to say, is the Avahi daemon not relly ready when the startup script for Avahi exits? This would be a bug of Avahi. Its startup script should wait for the Avahi daemon getting ready before it exits, so that dependent services (like CUPS) start up correctly.

Does Hardy boot correctly (at least my Hardy boxes do)?

Revision history for this message
Vancouverite (sethgilchrist) wrote :

I believe this is the same issue that I have cited in bug # 183457. (https://bugs.launchpad.net/ubuntu/+source/avahi/+bug/183457)

I get different errors, but the behavior is the same.

I was pointing the finger at the avahi update to 0.6.20-2ubuntu3.1, but it could be the CUPS update. Rolling back the avahi update doesn't fix the problem and I can't seem to revert to my old CUPS settings.

Thanks,
vancouverite

Revision history for this message
M. Spencer Reed (spenzer4hire) wrote :

I don't have a Hardy box to test on, but I could set up a VM if you think it would help.

Is there any information we can provide you that could help us get to the bottom of this? Or, is there someone that works with avahi that I could get in touch with?

Not a showstopper bug, but it gets annoying.

Revision history for this message
Hugues Fournier (hugues-fournier) wrote :

It would be interesting to get your gutsy logs in debug mode.
Can you set cupsd.conf to "LogLevel debug" or even "LogLevel debug2" (more verbose)

Revision history for this message
Vancouverite (sethgilchrist) wrote :

Gentlemen,
Thank you for contacting me about this bug. I am attaching the debug2 log file to this email as well as my cupsd.conf. Lines 1-1134 of the log file are on a fresh restart, I then used "cupsys restart" and it all seems to be working fine, but generated lines 1135 onwards. As noted by another person having the same problem, if I uncheck the "Share published printers connected to this system" under server settings in printer configuration the problem goes away.

I would like to help in any way I can, and if that means setting up a VM, I'm all for it. Is that a difficult thing to do?

System -
OS: ubuntu 7.10 (gusty)
cupsys: 1.3.2-1ubuntu7.3
printer: HP deskjet 5742 (connected via usb and seen using lsusb)

Cheers,
Seth

Revision history for this message
Hugues Fournier (hugues-fournier) wrote :

You were right Till, the bug is related not related to any of the recent updates of cups but to the fact that cups is starting to early for avahi to be ready.

The bug is present on only one of my numerous machines (many others have no symptoms so this really seems schedule-related)
On this machine I have reproduced it with all 1.3.2-1ubuntu7* cups versions (from 1.3.2-1ubuntu7 to 1.3.2-1ubuntu7.5), so it is confirmed the regression came with avahi.
The bug disappears either by reversing the effect of the avahi 0.6.20-2ubuntu3.1 update ( with : for l in 2 3 4 5; do old=/etc/rc$l.d/S18avahi-daemon; new=/etc/rc$l.d/S24avahi-daemon; if [ -e $old ] && ! [ -e $new ]; then sudo mv $old $new; fi; done )
Or by introducing a sleep time in the init file of cups.

Before the fix on the avahi side of bug 173470, avahi was starting too late, so cups was working but was opting at each start to not use avahi at all. Now it is using avahi, but it may not start depending of the timing between avahi and cups.

The bug should be either solved in the avahi startup script as Till said, or by introducing in the scheduler of CUPS a regular poll of the avahi-daemon (similar to the poll for new network interfaces that 1.3.2-1ubuntu7.5 has made come back) that would allow CUPS to dynamically change his mode of working (with or without avahi)

Revision history for this message
Hugues Fournier (hugues-fournier) wrote :

Obviously, as an urgency fix, we could simply revert the effect of the 0.6.20-2ubuntu3.1 until we have a stable solution.
That would disable avahi in cups, but it as been disabled during the last 3 months, until the recent avahi update.

Revision history for this message
Hugues Fournier (hugues-fournier) wrote :

Here is a debdiff for a proposed temporary urgency fix reverting the avahi changes.

As I wrote earlier, you can manually revert these changes by entering in a terminal :

for l in 2 3 4 5; do old=/etc/rc$l.d/S18avahi-daemon; new=/etc/rc$l.d/S24avahi-daemon; if [ -e $old ] && ! [ -e $new ]; then sudo mv $old $new; fi; done

Revision history for this message
M. Spencer Reed (spenzer4hire) wrote :

Thanks Till and Hugues. That was quick!

Revision history for this message
Martin Pitt (pitti) wrote :

not a cups bug

Changed in cupsys:
status: New → Invalid
Changed in avahi:
assignee: nobody → pitti
importance: Undecided → High
status: Incomplete → In Progress
Revision history for this message
Martin Pitt (pitti) wrote :

Thanks, Hugues, I fixed the stop priority (so that it's now the same as in gutsy final) and uploaded to gutsy-updates. I'll reopen bug 173470.

Changed in avahi:
status: In Progress → Fix Released
status: New → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Till, this is also an issue in Hardy. Can you please look into this?

Changed in avahi:
assignee: pitti → till-kamppeter
status: Fix Released → Confirmed
Changed in cupsys:
status: New → Invalid
Revision history for this message
Vancouverite (sethgilchrist) wrote :

Thank you guys, that was incredibly efficient!

Seth

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

pitti, we should assign it to the Avahi maintainer, as it is really a bug in avahi.The end of the avahi startup script should have some loop to check the avahi-daemon which only ends if the daemon is ready or if a 30-sec timeout gets reached.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

These needs to get fixed in avahi, because also other services than CUPS can break on this problem.

Changed in avahi:
assignee: till-kamppeter → lool
Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

The CUPS daemon crash needs also to be fixed. Reported the problem upstream:

http://www.cups.org/str.php?L2676

Changed in cupsys:
importance: Undecided → Medium
status: Invalid → Confirmed
Revision history for this message
Loïc Minier (lool) wrote :

I tried reproducing the problematic behavior by:
- Enabling printer sharing
- Sharing the PDF fake printer explicitely
- stopping avahi and cupsys
- starting avahi and cupsys just after one another with "avahi start && cupsys start" and concurrently "avahi start &; cupsys start"
but I couldn't trigger that DNSServiceRegister error.

Does someone have a better recipe than what I tried? Thanks!

Revision history for this message
Jamie Strandboge (jdstrand) wrote : Re: [Bug 183652] Re: CUPS fails to start on boot

On Thu, 17 Jan 2008, Loïc Minier wrote:

> I tried reproducing the problematic behavior by:
> - Enabling printer sharing
> - Sharing the PDF fake printer explicitely
> - stopping avahi and cupsys
> - starting avahi and cupsys just after one another with "avahi start && cupsys start" and concurrently "avahi start &; cupsys start"
> but I couldn't trigger that DNSServiceRegister error.
>
> Does someone have a better recipe than what I tried? Thanks!
>
This triggers the bug for me:

$ sudo sh -c '/etc/init.d/cupsys stop && /etc/init.d/avahi-daemon stop && sleep 5 ; /etc/init.d/avahi-daemon start ; /etc/init.d/cupsys start ; sleep 3 ; ps auxww|grep [c]upsd'
 * Stopping Common Unix Printing System: cupsd [ OK ]
 * Stopping Avahi mDNS/DNS-SD Daemon avahi-daemon [ OK ]
 * Starting Avahi mDNS/DNS-SD Daemon avahi-daemon [ OK ]
 * Starting Common Unix Printing System: cupsd [ OK ]
$

Though I have an actual printer and not just the PDF fake printer...

Revision history for this message
Till Kamppeter (till-kamppeter) wrote : Re: CUPS fails to start on boot

I tried this command on two Hardy boxes, one 32-bit and another 64-bit, on both machines the CUPS daemon survives for me.

I did not have a high system load. Perhaps one needs to drive up the load artificially somehow.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I should mention my command was on up-to-date Gutsy amd64.

With LogLevel set to 'warning', I have this error:
E [17/Jan/2008:20:13:21 +0000] DNSServiceRegister failed with error -65537

With LogLevel set to 'debug', I have this:
*** WARNING *** The programme 'cupsd' uses the Apple Bonjour compatiblity layer of Avahi.
*** WARNING *** Please fix your application to use the native API of Avahi!
*** WARNING *** For more information see <http://0pointer.de/avahi-compat?s=libdns_sd&e=cupsd>
... <snip> ...
cupsd: entrygroup.c:424: avahi_entry_group_add_service_strlst: Assertion `name' failed.

Revision history for this message
Hugues Fournier (hugues-fournier) wrote :

I have just succeeded to manually trigger the bug, even on all my machines where it was not appearing on startup (that means the (n-1) machines that had not any problem) by adding some niceness.

sudo nice -n 19 sh -c '/etc/init.d/cupsys stop && /etc/init.d/avahi-daemon stop && sleep 5 ; /etc/init.d/avahi-daemon start ; /etc/init.d/cupsys start ; sleep 3 ; ps auxww|grep [c]upsd'

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

I do not succeed to get CUPS to fail, also your niceness does not help on my boxes.

Revision history for this message
Hugues Fournier (hugues-fournier) wrote :

Here is a backtrace I dumped with some home-built debug packages :

#0 0xb7c84996 in *__GI_raise (sig=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0xb7c86221 in *__GI_abort () at abort.c:88
#2 0xb7c7ddb0 in *__GI___assert_fail (assertion=0xb7b5322b "name",
    file=0xb7b52a50 "entrygroup.c", line=424,
    function=0xb7b52be0 "avahi_entry_group_add_service_strlst") at assert.c:78
#3 0xb7b4caa2 in avahi_entry_group_add_service_strlst (group=0x80dd5c8,
    interface=-1, protocol=-1, flags=0, name=0x0, type=0x80dd398 "_ipp._tcp",
    domain=0x80dbf58 "local", host=0x0, port=631, txt=0x80db7a8)
    at entrygroup.c:424
#4 0xb7e8ece5 in reg_create_service (sdref=0x80dd540) at compat.c:923
#5 0xb7e8f135 in reg_client_callback (s=0x80db878,
    state=AVAHI_CLIENT_S_RUNNING, userdata=0x80dd540) at compat.c:996
#6 0xb7b4935d in client_set_state (client=0x80db878,
    state=AVAHI_SERVER_RUNNING) at client.c:101
#7 0xb7b4ab41 in filter_func (bus=0x80d91d8, message=0x80e12f0,
    userdata=0x80db878) at client.c:188
#8 0xb7ea2c22 in dbus_connection_dispatch () from /usr/lib/libdbus-1.so.3
#9 0xb7b520ec in dispatch_timeout_callback (t=0x80e17d8, userdata=0x0)
    at ../avahi-common/dbus-watch-glue.c:103
#10 0xb7b5c95a in timeout_callback (t=0x6) at simple-watch.c:449
#11 0xb7b5cc11 in avahi_simple_poll_dispatch (s=0x80d91d8)
    at simple-watch.c:562
#12 0xb7e90792 in DNSServiceProcessResult (sdref=0x80dd540) at compat.c:497
#13 0x08064678 in cupsdUpdateDNSSDBrowse (p=0x80cd7b0) at dirsvc.c:1364
#14 0x0808f06e in cupsdDoSelect (timeout=1) at select.c:455
#15 0x0806a660 in main (argc=2, argv=0xbf92eb74) at main.c:746

In fact, it is entirely an avahi problem :
reg_create_service calls avahi_entry_group_add_service_strlst with sdref->service_name_chosen as the 'name' argument.
As sdref->service_name_chosen is null, assert(name) launches the abort of the program.

Before calling reg_create_service, reg_client_callback correctly checks if the sdref->service_name_chosen pointer is null and corrects it, but the check is in a conditional statement executed if the sdref->service_name pointer is null (in this case it corrects it with a pointer to the hostname of the client)

So a basic fix is to copy the check outside the conditional statement, and to make the check correct 'service_name_chosen' with a pointer to a copy of the value 'service_name'

After having testing my own fix (with success), I checked upstream, and in fact a similar fix has already been committed:
http://avahi.org/changeset/1536
http://avahi.org/ticket/154
and is officially available since version 0.6.22 (so in fact hardy should be immune to this bug, is it ? That would explain why Till do not succeed to trigger it/).

In fact as the fix was backported in Debian version 0.6.21-2, released in unstable on September 9, so maybe with a little more luck, gutsy could have avoided this bug by being inspired to backport it too ;)

Revision history for this message
Hugues Fournier (hugues-fournier) wrote :

Here is a debdiff for an eventual SRU to Gutsy ( as it seems hardy get the fix from upstream )

Hopefully it fixes all the CUPS interaction problems.

Revision history for this message
Hugues Fournier (hugues-fournier) wrote :

I meant the CUPS-avahi interaction problems

description: updated
Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Hugues Fournier, thank you for this patch. I am reopening this bug for your new SRU request.

One question: What happens now if CUPS tries to connect to the avahi daemon too early, does CUPS wait for avahi or does it assume the absence of the avahi daemon and starts without avahi support (which would mean that avahi support will only work randomly depending on actual timing and system load.

Changed in avahi:
status: Fix Released → Triaged
Revision history for this message
Hugues Fournier (hugues-fournier) wrote :

As long as the avahi-daemon is running (ready or not really ready, depending on system load and timing), CUPS is set in avahi mode, so it never start without the avahi support if avahi is launched before.

Here is the precise way of working (based on the dnssdRegisterPrinter function in scheduler/dirsvc.c)
- On start (or when a new printer is added) CUPS use the client avahi library to register the printer.
- If the avahi daemon is running, ready or not, the client library returns a kDNSServiceErr_NoError, that leads to the registration of cupsdUpdateDNSSDBrowse as a callback function triggered by I/O on the mdns socket.
- In the other case, there is no registration of cupsdUpdateDNSSDBrowse, and there won't as it only happens on start or when a new printer is added. So Cups works in this case without avahi.

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Hugues, great, then your patch will fix this problem perfectly in Gutsy.

I will leave the upstream bug report for CUPS open (and the Hardy task for CUPS), as most CUPS users out there will probably still have an avahi version without this fix.

To the maintainer of the avahi package, please update avahi as soon as possible to fix the problem in Hardy.

Changed in avahi:
status: Confirmed → Triaged
Changed in cupsys:
status: Confirmed → Triaged
Changed in avahi:
importance: Undecided → High
Revision history for this message
Hugues Fournier (hugues-fournier) wrote :

Till, as I wrote, this patch is already in hardy. It came as a backport from upstream in 0.6.21-2

avahi (0.6.21-2) unstable; urgency=low

  * debian/patches/20_avahi-libdns_sd-crash.patch
    + Added. Fix a segfault when registering services with libdns_sd before the
      avahi daemon finished starting up (from upstream SVN)

 -- Sjoerd Simons <email address hidden> Sun, 09 Sep 2007 12:15:55 +0200

And then it was released in upstream-0.6.22 :

avahi (0.6.22-1) unstable; urgency=low
<..snip..>
  [ Sjoerd Simons ]
  * New upstream release
<..snip..>
  * debian/paches/20_avahi-libdns_sd-crash.patch
    - Removed. Fixed upstream

 -- Sjoerd Simons <email address hidden> Thu, 03 Jan 2008 17:00:24 +0100

This is why I was asking if it was really confirmed in hardy (in this case, there would be under hardy a supplementary cause to the segfault/abort)

As this bug is timing/system load dependent, anyone who have triggered the bug on one of one's box will be helpful during testing (that I bet will be far from short, to not stress Martin anymore with regressions (xorg-server was probably the straw that broke his camel’s back ;-) )

Revision history for this message
Till Kamppeter (till-kamppeter) wrote :

Upstream bug report in CUPS got closed, closing CUPS task.

SRU is only to apply the Avahi patch presented here. For Hardy the problem is fixed.

Changed in cupsys:
status: Triaged → Invalid
Changed in avahi:
status: Triaged → Fix Released
Revision history for this message
Hugues Fournier (hugues-fournier) wrote :

The proposed fixes of avahi (allowing avahi to be used by cups without segfault — while the current fix consists in disabling avahi in cups by starting it later i.e. reverting the fix of bug 173470) can be tested from my PPA (packages version 0.6.20-2ubuntu3.3~ppa1) :

deb http://ppa.launchpad.net/hugues-fournier/ubuntu gutsy main
( https://launchpad.net/~hugues-fournier/+archive )

Anyone who had triggered the fixed bug on one of one's box is helpful as this segfault is not easily reproduceable (being timing/load - dependent)

Of course, beware (for your own sake ;) ) not any other packages than avahi ones from my PPA on your gutsy system (as I presume you are not disposed towards getting experimental nspr/nss for your Firefox for example)

Revision history for this message
M. Spencer Reed (spenzer4hire) wrote :

No problems here. Looks good. Thanks guys.

Revision history for this message
Hugues Fournier (hugues-fournier) wrote :

For the record, from my search, this patch has been included in versions lower than 0.6.22 (in which it is inherited from upstream) in :
- Fedora Core 8 :
http://cvs.fedoraproject.org/viewcvs/rpms/avahi/F-8/avahi-0.6.21-libdns_sd-fix.patch (06 Sep 2007)
- Gentoo ( 0.6.21 ebuild, still "unstable")
http://sources.gentoo.org/viewcvs.py/gentoo-x86/net-dns/avahi/files/avahi-0.6.21-service_name_chosen.patch (25 Sep 2007)
- as I said earlier Debian sid ( 0.6.21-2 )
http://packages.debian.org/changelogs/pool/main/a/avahi/current/changelog.html#versionversion0.6.21-2 (09 Sep 2007)
http://packages.qa.debian.org/a/avahi/news/20070909T114703Z.html
- inherited by Debian lenny/testing (0.6.21-2) and by Ubuntu hardy (0.6.21-4)
http://packages.qa.debian.org/a/avahi/news/20070920T223906Z.html (20 Sep 2007)
https://launchpad.net/ubuntu/+source/avahi/0.6.21-4 (29 Nov 2007)

And it seems it did not raise any problem in the most mainstream (Fedora Core 8) or semi-mainstream (lenny) of the previously listed releases.

Revision history for this message
Martin Pitt (pitti) wrote :

Thanks, Hugues! I sponsored your debdiff and accepted it into gutsy-proposed. Anyone who is affected by this bug, please give feedback here.

Changed in avahi:
status: Triaged → Fix Committed
Revision history for this message
Hugues Fournier (hugues-fournier) wrote :

Sorry for answering so late.
I came through the test cases and with real-life reboots with obviously no more problems than with my own built packages during the last month.

But, if anyone can give other feedbacks, it is appreciated (as the problem of the original fix had not been detected, initially, during its time in -proposed, despite testing )

(Debdiff sponsored for my birthday, what a timeliness ;) )

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

Sorry for the long response time. Just installed 0.6.20-2ubuntu3.3 from gutsy-proposed, rebooted and all is well. Thanks!

Revision history for this message
Martin Pitt (pitti) wrote :

Copied to gutsy-updates, thanks for testing.

Changed in avahi:
status: Fix Committed → Fix Released
Revision history for this message
Hugues Fournier (hugues-fournier) wrote :

Thank you Martin !

I was just about to give some other feedback with months of testing behind, when I saw it was too late ;)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.