deployment of 9.0.system_test.ubuntu.ha_neutron_tun failed due to missing rabbimq binary on a node

Bug #1553077 reported by Artem Hrechanychenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Bogdan Dobrelya

Bug Description

Reproduced on ci: https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.ha_neutron_tun/33/console

Fuel version:
Output of shotgun2 report: http://paste.openstack.org/show/489285/

Steps to reproduce:

Deploy cluster in HA mode with Neutron VXLAN

Scenario:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 nodes with compute role
4. Deploy the cluster <===== failed here
5. Run network verification
6. Run OSTF

Expected result:
Deployment successfully and OSTF test are passed

Actual result:
"Error
Deployment has failed. Critical nodes failed: Node[2], Node[5], Node[4]. Stopping the deployment process!"

According to logs:
"2016-03-03 18:00:33 DEBUG [26060] Task time summary: rabbitmq with status failed on node 4 took 00:10:12"

root@node-4:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@messaging-node-4' ...
Error: unable to connect to node 'rabbit@messaging-node-4': nodedown

DIAGNOSTICS
===========

attempted to contact: ['rabbit@messaging-node-4']

rabbit@messaging-node-4:
  * connected to epmd (port 4369) on messaging-node-4
  * epmd reports: node 'rabbit' not running at all
                  other nodes on messaging-node-4: ['rabbitmq-cli-76']
  * suggestion: start the node

root@node-4:~# pcs status |grep rabbitmq
 Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]
     p_rabbitmq-server (ocf::fuel:rabbitmq-server): FAILED node-4.test.domain.local (unmanaged)
    p_rabbitmq-server_stop_0 on node-4.test.domain.local 'unknown error' (1): call=148, status=complete, last-rc-change='Thu Mar 3 17:53:36 2016', queued=0ms, exec=9567ms
    p_rabbitmq-server_monitor_30000 on node-4.test.domain.local 'not running' (7): call=432, status=complete, last-rc-change='Fri Mar 4 08:40:43 2016', queued=0ms, exec=716ms

Tags: area-library
Revision history for this message
Artem Hrechanychenko (agrechanichenko) wrote :
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
Changed in fuel:
status: New → Confirmed
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov)
Changed in fuel:
status: Confirmed → Won't Fix
status: Won't Fix → Confirmed
summary: - deployment of 9.0.system_test.ubuntu.ha_neutron_tun failed due to down
- rabbimq on node
+ deployment of 9.0.system_test.ubuntu.ha_neutron_tun failed due to
+ missing rabbimq binary on a node
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Yes, according to logs, the rabbitmq-server package was installed after the OCF RA have started and failed as there was no binary found:
2016-03-03T17:49:24.717631+00:00 info: INFO: get_status() failed with code 127. Command output: /usr/bin/timeout: failed to run command '/usr/sbin/rabbitmqctl': No such file or directory
puppet-apply.log:2016-03-03T17:52:01.434864+00:00 notice: (/Stage[main]/Rabbitmq::Install/Package[rabbitmq-server]/ensure) ensure changed 'purged' to 'present'

So, this is a race

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Here is the deploy tasks flow snippet http://pastebin.com/JK57MYS8

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/289860

Changed in fuel:
assignee: Kyrylo Galanov (kgalanov) → Bogdan Dobrelya (bogdando)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/289860
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=9be170dafa906b6934c3091f5994a65705e40a03
Submitter: Jenkins
Branch: master

commit 9be170dafa906b6934c3091f5994a65705e40a03
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Mar 7 15:39:58 2016 +0100

    Fix race of the rabbitmq OCF RA vs package install

    Closes-bug: #1553077
    Related-bug: #1540915

    Change-Id: I89e27e136062a0c3508a337abf81381bcdd56790
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Ksenia Svechnikova (kdemina) wrote :

Verify test "Deploy cluster in HA mode with Neutron VXLAN" in ISO fuel-9.0-250

 https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.ha_neutron_tun/90/

xpected result:
Deployment successfully and OSTF test are passed

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.