[10.0 swarm] task "rabbitmq" fails

Bug #1668311 reported by Sergey Novikov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Confirmed
High
MOS Oslo

Bug Description

Detailed bug description:

The issue was found by https://product-ci.infra.mirantis.net/job/10.0.system_test.ubuntu.custom_hostname/195/testReport/(root)/default_hostname/default_hostname/

Steps to reproduce:
            1. Create a cluster
            2. Add 3 nodes with controller role
            3. Add 1 node with compute role
            4. Deploy the cluster

2017-02-27 01:32:22 INFO [17257] Cluster[]: All nodes are finished. Failed tasks: Task[rabbitmq/5] Stopping the deployment process!

Additional info: rabbitmq cluster seems alive http://paste.openstack.org/show/600641/

And this bug doesn't look similar to https://bugs.launchpad.net/mos/+bug/1626933 - rabbitmq-server has version 3.6.6-1~u16.04+mos1

Revision history for this message
Sergey Novikov (snovikov) wrote :
Changed in mos:
status: New → Confirmed
assignee: nobody → MOS Oslo (mos-oslo)
milestone: none → 10.0
tags: added: area-oslo
Revision history for this message
Alexey Lebedeff (alebedev-a) wrote :

puppet manifest was adding rabbit users right at the moment when pacemaker decided to restart some rabbits. Funny thing is that adding users in puppet is a useless operation in presence of pacemaker: those created users will be lost during resets/joins performed by OCF script.

We need to disable this user-creation activity completely - and the ONLY thing that puppet should do is to install package and drop 2 config files into their proper locations (i.e. no user management, no (re)starting/stopping/enabling of systemd unit, etc.)

tags: added: swarm-blocker
removed: swarm-fail
tags: added: swarm-fail
removed: swarm-blocker
Changed in mos:
milestone: 10.0 → 9.x-updates
Revision history for this message
Alexey Lebedeff (alebedev-a) wrote :
Revision history for this message
Alexey Lebedeff (alebedev-a) wrote :

I've deployed an env manually with this patch and there were no traces of spurious rabbitmq restarts

Revision history for this message
Alexey Lebedeff (alebedev-a) wrote :

To check whether patching had an effect you need to run the following command on any controller:

crm resource param p_rabbitmq-server show host_ip

And it should return "127.0.0.1" here.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.