[custom BVT] Rabbitmq v3.3.5 in docker container being restarted in a loop

Bug #1407405 reported by Bogdan Dobrelya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Aleksandr Didenko

Bug Description

http://jenkins-product.srt.mirantis.net:8080/view/custom_iso/job/custom.centos.bvt_1/402/

This BVT was run against new puppet-rabbitmq v5.0.0 module synced from puppetlabs ( https://review.openstack.org/127166 )
Docker and docker-rabbitmq logs show what rabbitmq-service is being killed and restarted in a loop.
The RC could be related to rabbit_check and timeouts between kill & start vs checks performed.

Changed in fuel:
milestone: none → 6.1
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Looks like the loop is not related to the rabbit_check.
There is a following template in a loop logs:
Notice: Finished catalog run in 5.26 seconds
+ exitcode=2
+ [[ 2 != 0 ]]
+ [[ 2 != 2 ]]
+ service rabbitmq-server stop
Stopping rabbitmq-server: rabbitmq-server.
+ pkill -u rabbitmq
[debug] utils.go:267 [hijack] End of stdout

The next step after pkill is a start of rabbit process, but it cannot happen due to some strange race condition.
Once it does, the loops tops and rabbit is running

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-main (master)

Fix proposed to branch: master
Review: https://review.openstack.org/144877

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Bogdan Dobrelya (bogdando)
status: Confirmed → In Progress
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

I tested the fix with Custom Ubuntu bvt (#346) and the issue with a race was gone

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The fix actually a w/a and the root cause of this race is still unknown

Revision history for this message
Aleksandr Didenko (adidenko) wrote :

Here's the root cause of this bug (in /usr/local/bin/start.sh taken from fuel-gerrit-6.1-58-2015-01-02_15-09-14.iso):

#!/bin/bash -xe

we run start.sh rabbitmq container script with '-e' option, so it stops rabbitmq service just fine, but then it tries to execute 'pkill -u rabbitmq', it failes (no such processes), 'start.sh' exits on failure, docker container exits as well. Then supervisorctl starts it over again.

So we should either remove '-e' option or make sure all our commands return 0 in that script.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/145838

Changed in fuel:
assignee: Bogdan Dobrelya (bogdando) → Aleksandr Didenko (adidenko)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-main (master)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: master
Review: https://review.openstack.org/144877
Reason: superseded by https://review.openstack.org/#/c/145838/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-main (master)

Reviewed: https://review.openstack.org/145838
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=a6386b056d101d0ec2f3cf6d969c1d75c73b9924
Submitter: Jenkins
Branch: master

commit a6386b056d101d0ec2f3cf6d969c1d75c73b9924
Author: Aleksandr Didenko <email address hidden>
Date: Thu Jan 8 18:51:24 2015 +0200

    Don't break rabbit start.sh on pkill/stop failure

    We have two commands that attempt to stop rabbitmq service and
    it's possible that one of them may fail, which is OK. So we need
    to make sure they don't produce non-zero exit code in order to not
    break start.sh script which is executed with '-e' bash option.

    Closes-bug: #1407405
    Change-Id: I4e7956ae1800f56c6df3617c384777ac3b6dc3f9

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Please note that there is no need to backport this fix for <6.1 as their start.sh are not affected

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.