rabbitmq not ready after restart

Bug #1449056 reported by Mathieu Rohon on 2015-04-27
38
This bug affects 4 people
Affects Status Importance Assigned to Milestone
devstack
Undecided
Tony Breeds
rabbitmq-server (Ubuntu)
Medium
Unassigned

Bug Description

I'm running devstack on Debian. The change Ie45446d3817b2f15631f03b2af84749fe936c67b introduced a restart of the rabbitmq-server service before using it.

But on my platform, the service is not ready when devstack is asking for the list of users. Devstack fails with the following error :

2015-04-27 14:12:16.811 | Error: rabbit application is not running on node rabbit@deb-ds-12.
2015-04-27 14:12:16.811 | * Suggestion: start it with "rabbitmqctl start_app" and try again
2015-04-27 14:12:16.815 | failed to list users

Fix proposed to branch: master
Review: https://review.openstack.org/177785

Changed in devstack:
assignee: nobody → Mathieu Rohon (mathieu-rohon)
status: New → In Progress

I want to look at more logs.

Dr. Jens Harbott (j-harbott) wrote :

The same happens on Ubuntu 15.04, the "systemctl start rabbitmq" seems to only notify systemd and return to the caller before rabbitmq is really functional.

Changed in devstack:
assignee: Mathieu Rohon (mathieu-rohon) → Tony Breeds (o-tony)

Change abandoned by Tony Breeds (<email address hidden>) on branch: master
Review: https://review.openstack.org/186641
Reason: Actually it does look like un upstream systemd issue.

I'll poke at that rather than adding a work around here.

Tony Breeds (o-tony) wrote :

Looks like Fedora hit similar (but not identical) issues.

https://bugzilla.redhat.com/show_bug.cgi?id=1103524

The answer there was to patch systemd_notify support into rabbit.

Changed in devstack:
assignee: Tony Breeds (o-tony) → Dr. Jens Rosenboom (j-rosenboom-j)
Tony Breeds (o-tony) wrote :

Trying to restate the problem a little morew clearly as after we've looked into this the 'Bug Description' is really more of a symptom.
----

The current systemd sverice definition for rabbitmq-server "completes" before the rabbit process is actually available. This means that scripted environments that start rabbitmq-server and then try to talk to it fall subject to a race (see Bug Description).

The best solution is to package erlang-sd_notify[1] and patch rabbit <3.5 with something like[2] and then change the systemd service type from 'simple' to 'notify'. That fix is probably better for debian unstable / wily.

For testing/vivid I propose Adding a new script that will wait for the daemon to be available and calling that from from the service file.

[1] https://github.com/lemenkov/erlang-sd_notify
[2] http://pkgs.fedoraproject.org/cgit/rabbitmq-server.git/tree/rabbitmq-server-0001-Add-systemd-notify-support.patch?h=f21

Tony Breeds (o-tony) wrote :

Update package to
1) include rabbitmq-server-wait: A helper script that uses the same environment file as rabbitmq-server and waits until the server is up.
2) Call the new file from the systemd service.

This ensures that the server is available when systemctl start rabbitmq-server exits.

Changed in devstack:
assignee: Dr. Jens Rosenboom (j-rosenboom-j) → Tony Breeds (o-tony)

The attachment "rabbitmq-sync-systemd.patch" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
James Page (james-page) wrote :

I've uploaded this to Debian unstable; it will autosync as soon as launchpad notices.

Changed in rabbitmq-server (Ubuntu):
status: New → Fix Committed
importance: Undecided → Medium
milestone: none → ubuntu-15.06
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package rabbitmq-server - 3.5.1-2

---------------
rabbitmq-server (3.5.1-2) unstable; urgency=medium

  [ Tony Breeds ]
  * systemd: Ensure that rabbitmq has started before marking service as
    running (LP: #1449056).

  [ James Page ]
  * systemd: Drop use of /etc/default/rabbitmq-server.

 -- James Page <email address hidden> Tue, 02 Jun 2015 11:40:59 +0100

Changed in rabbitmq-server (Ubuntu):
status: Fix Committed → Fix Released
Tony Breeds (o-tony) wrote :

@james-page: Thanks! Forgive me for not understanding the process but is it possible to get the fix in vivid as well?

I assume there will need to be some QA process around that. I'm happy to verify any proposed builds.

Dr. Jens Harbott (j-harbott) wrote :

I tested http://launchpadlibrarian.net/208123057/rabbitmq-server_3.5.1-2_all.deb and it works fine for me.

It would be great to see this backported to vivid.

Reviewed: https://review.openstack.org/186641
Committed: https://git.openstack.org/cgit/openstack-dev/devstack/commit/?id=6bc905c3488a93fa87776bcd0af7e362a90b082f
Submitter: Jenkins
Branch: master

commit 6bc905c3488a93fa87776bcd0af7e362a90b082f
Author: Tony Breeds <email address hidden>
Date: Fri May 15 12:51:43 2015 +1000

    Change the restart_rpc_backend loop to accomodate async rabbitmq

    Some distros have converted to systemd for starting RabbitMQ. This has
    resulted in:
    ---
    [Call Trace]
    ./stack.sh:904:restart_rpc_backend
    /home/stack/projects/openstack/openstack-dev/devstack/lib/rpc_backend:201:die
    [ERROR] /home/stack/projects/openstack/openstack-dev/devstack/lib/rpc_backend:201 Failed to set rabbitmq password
    Error on exit
    World dumping... see /opt/stack/logs/worlddump-2015-05-29-031618.txt for details
    ---

    Because 'restart_service rabbitmq-server' returns before the server is ready to
    accept connections.

    Alter the retry loop to only restart the rabbitmq-server every second time
    through the loop. Allowing time for the slow rabbit to start.

    Closes-Bug: 1449056
    Change-Id: Ibb291c1ecfd109f9ed10b5f194933364985cc1ce

Changed in devstack:
status: In Progress → Fix Released

Change abandoned by Mathieu Rohon (<email address hidden>) on branch: master
Review: https://review.openstack.org/177785

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.