[upgrade] Rabbitmq down on converge step M to N

Bug #1635231 reported by mathieu bultel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Sofer Athlan-Guyot

Bug Description

Rabbitmq is down during the converge. This issue could hang until heat timeout:

Full list of resources:

 ip-172.21.35.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
 ip-172.21.33.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
 ip-10.12.149.6 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2
 Clone Set: haproxy-clone [haproxy]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 ip-172.21.33.11 (ocf::heartbeat:IPaddr2): Started overcloud-controller-0
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Master/Slave Set: redis-master [redis]
     Masters: [ overcloud-controller-0 ]
     Slaves: [ overcloud-controller-1 overcloud-controller-2 ]
 ip-10.12.149.91 (ocf::heartbeat:IPaddr2): Started overcloud-controller-1
 ip-172.21.36.10 (ocf::heartbeat:IPaddr2): Started overcloud-controller-2
 openstack-cinder-volume (systemd:openstack-cinder-volume): Started overcloud-controller-0

Failed Actions:
* rabbitmq_monitor_10000 on overcloud-controller-2 'not running' (7): call=86, status=complete, exitreason='none',
    last-rc-change='Thu Oct 20 12:07:55 2016', queued=1363ms, exec=1601ms

PCSD Status:
  overcloud-controller-0: Online
  overcloud-controller-1: Online
  overcloud-controller-2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

$ heat resource-list overcloud -n 5 | grep -i in_pro
WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead
| AllNodesDeploySteps | 4818c1d2-1022-4cd3-b63f-a9737ef6aec3 | OS::TripleO::PostDeploySteps | CREATE_IN_PROGRESS | 2016-10-20T12:04:36Z | overcloud |
| ComputeDeployment_Step3 | 5a82a140-80db-44a7-9c03-84269ab071b3 | OS::Heat::StructuredDeploymentGroup | CREATE_IN_PROGRESS | 2016-10-20T12:04:37Z | overcloud-AllNodesDeploySteps-eedu2uqm2msi |
| ControllerDeployment_Step3 | 04a3f286-1f2c-4025-93fd-ce01762eedc3 | OS::Heat::StructuredDeploymentGroup | CREATE_IN_PROGRESS | 2016-10-20T12:04:38Z | overcloud-AllNodesDeploySteps-eedu2uqm2msi |
| 0 | a5c19dd4-5372-47a5-a36e-2f82f0205c1f | OS::Heat::StructuredDeployment | CREATE_IN_PROGRESS | 2016-10-20T12:12:08Z | overcloud-AllNodesDeploySteps-eedu2uqm2msi-ComputeDeployment_Step3-jxawuhg6hkme |
| 0 | fa127e05-e2e6-48fd-bf66-2e1949ccf1cd | OS::Heat::StructuredDeployment | CREATE_IN_PROGRESS | 2016-10-20T12:12:08Z | overcloud-AllNodesDeploySteps-eedu2uqm2msi-ControllerDeployment_Step3-k4zox3i4xrmm |
| 1 | 4bf218da-3f8a-4554-91e8-d62b57f8e4ae | OS::Heat::StructuredDeployment | CREATE_IN_PROGRESS | 2016-10-20T12:12:08Z | overcloud-AllNodesDeploySteps-eedu2uqm2msi-ControllerDeployment_Step3-k4zox3i4xrmm |
| 2 | 32021dc1-f4ce-4440-9482-17e3cb226013 | OS::Heat::StructuredDeployment | CREATE_IN_PROGRESS | 2016-10-20T12:12:08Z | overcloud-AllNodesDeploySteps-eedu2uqm2msi-ControllerDeployment_Step3-k4zox3i4xrmm

Tags: upgrade
Revision history for this message
mathieu bultel (mat-bultel) wrote :

A pcs resource cleanup solve the issue,
I think it would be only a doc patch here.

description: updated
Changed in tripleo:
status: New → In Progress
assignee: nobody → mbu (mat-bultel)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-docs (master)

Fix proposed to branch: master
Review: https://review.openstack.org/389609

Revision history for this message
mathieu bultel (mat-bultel) wrote :
Changed in tripleo:
importance: Undecided → Medium
importance: Medium → High
milestone: none → ocata-1
Steven Hardy (shardy)
Changed in tripleo:
milestone: ocata-1 → ocata-2
Changed in tripleo:
milestone: ocata-2 → ocata-3
Changed in tripleo:
milestone: ocata-3 → ocata-rc1
Changed in tripleo:
milestone: ocata-rc1 → ocata-rc2
Changed in tripleo:
milestone: ocata-rc2 → pike-1
Changed in tripleo:
milestone: pike-1 → pike-2
Changed in tripleo:
milestone: pike-2 → pike-3
Changed in tripleo:
milestone: pike-3 → pike-rc1
Ben Nemec (bnemec)
Changed in tripleo:
milestone: pike-rc1 → queens-1
tags: added: upgrade
Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
assignee: mathieu bultel (mat-bultel) → Sofer Athlan-Guyot (sofer-athlan-guyot)
Revision history for this message
Sofer Athlan-Guyot (sofer-athlan-guyot) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-docs (master)

Reviewed: https://review.openstack.org/389609
Committed: https://git.openstack.org/cgit/openstack/tripleo-docs/commit/?id=cb0dcc7190ec9d22e92559155b33ab7b6c119c50
Submitter: Zuul
Branch: master

commit cb0dcc7190ec9d22e92559155b33ab7b6c119c50
Author: Sofer Athlan-Guyot <email address hidden>
Date: Mon Nov 6 15:06:04 2017 +0100

    Add cleanup checks in the upgrade doc

    After the converge step, its sometimes necessary to run
    a pcs resource cleanup to get a clean and working cluster
    Adding that as a note at the end of the converge doc

    Co-Authored-By: Athlan-Guyot Sofer <email address hidden>
    Change-Id: Ic60204bf66b179830e7f179ac67f5ed2558aef8a
    Closes-bug: #1635231

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.