Ubuntu

Laptop won't shut down with rabbitmq running

Reported by Jonathan Lange on 2010-11-03
210
This bug affects 42 people
Affects Status Importance Assigned to Milestone
rabbitmq-server (Ubuntu)
Medium
Kovacs Robert

Bug Description

Binary package hint: rabbitmq-server

My laptop won't shut down when the rabbitmq server is running. I can shut down the rabbitmq-server like this:
jml@truth:/etc/init.d$ sudo /etc/init.d/rabbitmq-server stop
Stopping rabbitmq-server: rabbitmq-server.

After stopping the rabbitmq-server like that I can shutdown without problems. If I re-start the rabbitmq-server, then I can no longer shut down. Instead, the Ubuntu shutdown screen keeps going forever. When I try to SSH into the laptop I get "Operation timed out".

Many other Launchpad developers have pointed out this problem, and I suspect bug 660460 was filed partly with the intent of requesting a workaround.

AFAICT, this seems to only affect Maverick.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: rabbitmq-server 1.8.0-1ubuntu2
ProcVersionSignature: Ubuntu 2.6.35-22.35-generic 2.6.35.4
Uname: Linux 2.6.35-22-generic x86_64
Architecture: amd64
Date: Wed Nov 3 09:19:59 2010
EcryptfsInUse: Yes
PackageArchitecture: all
ProcEnviron:
 PATH=(custom, user)
 LANG=en_AU.UTF-8
 SHELL=/bin/bash
SourcePackage: rabbitmq-server

Jonathan Lange (jml) wrote :
Jonathan Lange (jml) on 2010-11-03
description: updated
Dave Walker (davewalker) on 2010-11-03
Changed in rabbitmq-server (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Dan (dan-agnit) wrote :

When /etc/default/bootlogd
BOOTLOGD_ENABLED=YES
--------------
init: Disconnected from system bus
init: dbus main process (745) killed by TERM signal
Stopping rabbitmq-server: nw-dispatcher.action: Caught signal 15, shutting down... nw-dispatcher.action: Could not get the system bus. Make sure the message bus deamon is running! Message: Did not receive a reply. Possible causes include: the remote application did not send a reply. The message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.

Thomas Herve (therve) wrote :

I also noticed the problem at boot time: sometimes rabbitmq is just not started when I start my machine. I'm not an expert, but I suspect it's a problem with upstart init info, which doesn't seem to mention $local_fs as a dependency.

Alejandro J. Cura (alecu) wrote :

My solution to this is to shut down the system rabbitmq instance altogether, since I only run rabbitmq instances as my user when running integration tests.

For this I added "chmod -x /usr/sbin/rabbitmq-multi" to /etc/rc.local, so it gets reset even after new rabbit packages are released.

Clint Byrum (clint-fewbar) wrote :

Thomas, AFACT, there is no upstart job for rabbitmq-server, and the dependencies in /etc/init.d/* don't really apply to the system anymore w.r.t. filesystems, because mountall does all mounts before rc-sysinit 2 is called. Further, the dependency on remote_fs is something that would always come *after* local_fs, so that isn't an issue either.

You should have something in /var/log/boot.log, /var/log/rabbitmq/startup_log, or /var/log/rabbitmq/startup_err that indicates why it was not started. If its symlink is there as /etc/rc2.d/S20rabbitmq-server as it should be, then there's no real reason that I can think of for it not to start.

As for stopping, it should be one of the first things shutdown:

lrwxrwxrwx 1 root root 25 Dec 15 11:20 K20rabbitmq-server -> ../init.d/rabbitmq-server
-rw-r--r-- 1 root root 353 Sep 7 2009 README
lrwxrwxrwx 1 root root 18 Nov 17 10:46 S20sendsigs -> ../init.d/sendsigs
lrwxrwxrwx 1 root root 17 Nov 17 10:46 S30urandom -> ../init.d/urandom
lrwxrwxrwx 1 root root 22 Nov 17 10:46 S31umountnfs.sh -> ../init.d/umountnfs.sh
lrwxrwxrwx 1 root root 20 Nov 17 10:46 S35networking -> ../init.d/networking
lrwxrwxrwx 1 root root 18 Nov 17 10:46 S40umountfs -> ../init.d/umountfs
lrwxrwxrwx 1 root root 20 Nov 17 10:46 S60umountroot -> ../init.d/umountroot
lrwxrwxrwx 1 root root 14 Nov 17 10:46 S90halt -> ../init.d/halt

Its possible that the stop method isn't effective, but the next thing, sendsigs, will be sending it kill -9, so it doesn't make sense why the machine wouldn't poweroff because of this.

If somebody who is experiencing this can do an alt-f7 or ctrl-alt-f7 to see the console, that might provide some insight into what is going on.

Björn Tillenius (bjornt) wrote :

Clint, when shutting down, the console doesn't show much. It shows killing apache, a few other things, and then it stalls when trying to shut down rabbitmq. I.e, it doesn't get any further, so sendsigs doesn't get executed.

It's the 'rabbitmq-multi stop_all' that hangs. Seems it's waiting forever for the node to shut down. I'm attaching shutdown_log. Also, I modified the init.d script to log what 'rabbitmq-multi status' says, and it says:

  Status of all running nodes...
  Node 'rabbit@ixia' with Pid 3743: not_running
  done.

Björn Tillenius (bjornt) wrote :
Thomas Herve (therve) wrote :

Client, thanks for looking at it. As Björn mentioned there is not much to say. I think I've identified one linked problem though, where the service doesn't start sometimes: rabbitmq is tied to the existence of a particular hostname (generally the one detected when first installed).

Since Maverick, NetworkManager seems to do some weird trick with /etc/hosts, making the local hostname unresolvable at some points. At least when it doesn't start, I get a "can't resolve hostname" error in startup_err, which could map with what Björn is saying, where the hostname can't be found at shutdown (presumably after NM did his job on a laptop).

Björn: do you have anything in shutdown_err by any chance?

On Thu, Dec 16, 2010 at 08:12:49AM -0000, Thomas Herve wrote:
> Since Maverick, NetworkManager seems to do some weird trick with
> /etc/hosts, making the local hostname unresolvable at some points. At
> least when it doesn't start, I get a "can't resolve hostname" error in
> startup_err, which could map with what Björn is saying, where the
> hostname can't be found at shutdown (presumably after NM did his job on
> a laptop).

Right, this seems to be the issue somehow. I can reproduce the issue
without shutting down the laptop, if I disconnect any network that
NetworkManager is connected to. If NM isn't connected to anything,
rabbit-multi status says that rabbit@ixia is "not_running" and
rabbitmq-multi stop_all hangs while trying to stop rabbit@ixia.

> Björn: do you have anything in shutdown_err by any chance?

No, nothing.

--
Björn Tillenius | https://launchpad.net/~bjornt

Björn Tillenius (bjornt) wrote :

To add further to Thomas' observations, after disconnecting NM, the beginning of my /etc/hosts file look like this:

    192.168.60.104|_ixia|___# Added by NetworkManager
    127.0.0.1|__localhost.localdomain|__localhost
    ::1|ixia|___localhost6.localdomain6|localhost6
    127.0.1.1|__ixia

If I comment out the first line (192.168.60.104), rabbitmq-multi status and stop_all work as expected. Could it be that NetworkManager should remove the added line in /etc/hosts when it disconnects from a network?

Björn Tillenius (bjornt) wrote :

FWIW, an ugly workaround to this bug is to place a file in /etc/rc0.d and /etc/rc6.d that removes the line NetworkManager added before rabbitmq is shut down. I'm attaching the file I'm using to do that.

Chang Phui-Hock (phuihock) wrote :

I have the same exact problem and it's an NM bug as pointed out by Thomas. This is closed related to https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/632896.

A new proposed network-manager package has just been uploaded to fix the problem. It works for me, so far.

Clint Byrum (clint-fewbar) wrote :

I think this bug may have been related to bug #653405 , which is fixed in Ubuntu 11.10. Would appreciate if somebody could confirm that they can shutdown an 11.10 machine with rabbitmq installed.

Gavin Panella (allenap) wrote :

This bug no longer affects me in 11.04.

Huw Wilkins (huwshimi) wrote :

I am still having this problem on 11.10. Even after a fresh install.

I can provide further details if required.

Gavin Panella (allenap) wrote :

After disappearing in 11.04, this bug has come back in 11.10, both on
a fresh install and an upgrade.

Jonathan Lange (jml) wrote :

On Tue, Oct 18, 2011 at 9:10 AM, Gavin Panella
<email address hidden> wrote:
> After disappearing in 11.04, this bug has come back in 11.10, both on
> a fresh install and an upgrade.

Yeah, it's been a known issue in rabbitmq for the whole of oneiric.

jml

I'm having this problem for a long time on 11.04. Upgraded to 11.10 and am still having the same issue. Tested on a fresh installation of 11.10 and had the same behaviour, i.e. shutdown brings me back to login screen.

Sam Stoelinga (sammiestoel) wrote :

Having the same problem here on 11.10 on 11.04 I did not have this issue. But after doing a fresh install of 11.10 and installing rabbitmq I can not shutdown the PC normally. It always returns to the lightdm inlog screen and from there, there is no way to shutdown even with command line shutdown now will make my pc freeze and have to press the power button for a long time to shut it down by hand.

Sampo Savolainen (v2) wrote :

I have the same exact problem. Poweroff just goes back to the login screen with I assume part of the OS services shut down. The only way to "cleanly" power off is to log in and do "sudo /sbin/poweroff now".

What is the root cause here? Wrong service shutdown order?

So it sounds like NetworkManager is making changes to /etc/hosts that are confusing Rabbit. To help figure out if this is the problem, can anyone experiencing this on 11.10 tell me:

* What does "ping ${hostname}" look like when networking is up?
* What does "ping ${hostname}" look like when networking is down?

Sorry, I mean "ping $(hostname)", with round brackets. Duh.

Alejandro J. Cura (alecu) wrote :

@simon-macmullen:
When online pings happen normally.
When offline, ping says "unknown host $(hostname)"
Here's the full output:

*****ONLINE*****************************
alecu@bollo:~$ ping $(hostname)
PING bollo (192.168.1.10) 56(84) bytes of data.
64 bytes from bollo (192.168.1.10): icmp_req=1 ttl=64 time=0.054 ms
64 bytes from bollo (192.168.1.10): icmp_req=2 ttl=64 time=0.046 ms
64 bytes from bollo (192.168.1.10): icmp_req=3 ttl=64 time=0.045 ms
64 bytes from bollo (192.168.1.10): icmp_req=4 ttl=64 time=0.041 ms
64 bytes from bollo (192.168.1.10): icmp_req=5 ttl=64 time=0.041 ms
^C
--- bollo ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4007ms
rtt min/avg/max/mdev = 0.041/0.045/0.054/0.007 ms
*****OFFLINE*****************************
alecu@bollo:~$ ping $(hostname)
ping: unknown host bollo
alecu@bollo:~$

PS: Thanks for working on this bug.

Alberto Donato (ack) wrote :

I have the same shutdown/reboot problem on oneiric with rabbitmq.

ping works fine for me both when online and offline (hostname resolves to 127.0.1.1 in both cases).

total (talakosk) wrote :

The problem might be with the rabbitmq service user and lightdm. Lightdm can't log out because
there is another "user" still logged in.

If you change from lightdm to use gdm does it help?
- sudo dpkg-reconfigure gdm (select gdm)
- (reboot)

This bug might be related https://bugs.launchpad.net/ubuntu/+source/gnome-session/+bug/855556
although rabbitmq is a "system user" and they are talking about "real users".

Very similar kind of bug has been discussed before. There was also a fix for it:
https://bugs.launchpad.net/ubuntu/+source/gdm/+bug/696038

I've been running RabbitMQ 2.6.1 and 2.7.0 for a while on 11.10 (and 11.04) without any of these problems.

I'm not sure what is different in my setup compared to yours, but maybe we can work it out?

I am running with lightdm, not gdm.

NetworkManager is enabled, but the ethernet is managed via /etc/network/interfaces instead. (eth0 is bound to a bridge, br0, which then uses DHCP to obtain an IP. Several virtual machines also bind to br0)

Sampo Savolainen (v2) wrote :

This seems to be connected to bug https://bugs.launchpad.net/ubuntu/+source/gnome-session/+bug/880771

ConsoleKit thinks rabbitmq-server is a user session and therefore lightdm refuses to shut down the computer. The question is: why dows rabbitmq look like a user session?

Stuart Bishop (stub) wrote :

Bug #395281 has some relevant discussions

Loïc Minier (lool) wrote :

See bug #913464 for an earlier fix which got lost in new upstream version (see bug #922600 for the new upstream version bug)

We believe this to be fixed in the upstream Debian package:

http://lists.rabbitmq.com/pipermail/rabbitmq-announce/attachments/20120319/e92146ee/attachment.txt

(I don't think there was ever a Debian bug for this but it was addressed in RabbitMQ 2.8.0.)

Dave Walker (davewalker) wrote :

su was keeping a session opened, blocking the shutdown. I believe this was fixed in bug bug 966269 .. which is now included in Debian aswell.

I think this was finally fixed with:
rabbitmq-server (2.7.1-0ubuntu4) precise; urgency=low

  [ Dave Walker ]
  * debian/rabbitmq-script-wrapper: Use start-stop-daemon instead of su
    to run the commands. This also allows rabbitmq to start on the
    installer if invoke-rc.d is used with --force. (LP: #966269)

  [ Andres Rodriguez ]
  * debian/rabbitmq-server.init: Use --no-wait in initctl emit command to
    not stall apt-get installations. (LP: #968124)
 -- Andres Rodriguez <email address hidden> Thu, 29 Mar 2012 10:46:26 -0400

Changed in rabbitmq-server (Ubuntu):
status: Confirmed → Fix Released
Changed in rabbitmq-server (Ubuntu):
assignee: nobody → Kovacs Robert (robi-kovacs11)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers