Disable VIPs before calling "pcs cluster stop --all"

Bug #1577570 reported by Ian Pilcher
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Ian Pilcher
Liberty
Fix Released
Undecided
Unassigned
Mitaka
Fix Released
Undecided
Unassigned

Bug Description

During a major version upgrade (Kilo to Liberty for example) the upgrade scripts use pcs to stop the controller cluster.

If the controller on which the pcs command is executed happens to have a VIP on the internal network, pcs may use the VIP as the source address for communication with another cluster node. When pacemaker is stopped, this VIP goes away, and pcs never receives a response from the other node. This causes the pcs command to hang indefinitely; eventually the upgrade times out and fails.

Ian Pilcher (arequipeno)
Changed in tripleo:
status: New → Confirmed
assignee: nobody → Ian Pilcher (arequipeno)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/311860

Changed in tripleo:
status: Confirmed → In Progress
Revision history for this message
Marios Andreou (marios-b) wrote :

note for some more context if interested, this is first discussed at https://bugzilla.redhat.com/show_bug.cgi?id=1330688 Bug 1330688 - "pcs cluster stop --all" hangs

tags: added: liberty-backport-potential mitaka-backport-potential
Steven Hardy (shardy)
Changed in tripleo:
importance: Undecided → High
milestone: none → newton-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/311860
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6e65c8fc0a3c2b4025b08d1a2c8b03696e8fb7f6
Submitter: Jenkins
Branch: master

commit 6e65c8fc0a3c2b4025b08d1a2c8b03696e8fb7f6
Author: Ian Pilcher <email address hidden>
Date: Mon May 2 16:21:55 2016 -0500

    Disable VIPs before stopping cluster during version upgrade

    If "pcs cluster stop --all" is executed on a controller that
    happens to have a VIP on the internal network, pcs may use the
    VIP as the source address for communication with another cluster
    node. When pacemaker is stopped this VIP goes away, and pcs never
    receives a response from the other node. This causes pcs to hang
    indefinitely; eventually the upgrade times out and fails.

    Disabling the VIPs before stopping the cluster avoids this
    situation.

    Change-Id: I6bc59120211af28456018640033ce3763c373bbb
    Closes-Bug: 1577570

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/312454

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/312455

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/mitaka)

Reviewed: https://review.openstack.org/312454
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=296782c1d60908650be45fa16c4d6cac1048ec90
Submitter: Jenkins
Branch: stable/mitaka

commit 296782c1d60908650be45fa16c4d6cac1048ec90
Author: Ian Pilcher <email address hidden>
Date: Mon May 2 16:21:55 2016 -0500

    Disable VIPs before stopping cluster during version upgrade

    If "pcs cluster stop --all" is executed on a controller that
    happens to have a VIP on the internal network, pcs may use the
    VIP as the source address for communication with another cluster
    node. When pacemaker is stopped this VIP goes away, and pcs never
    receives a response from the other node. This causes pcs to hang
    indefinitely; eventually the upgrade times out and fails.

    Disabling the VIPs before stopping the cluster avoids this
    situation.

    Change-Id: I6bc59120211af28456018640033ce3763c373bbb
    Closes-Bug: 1577570
    (cherry picked from commit 6e65c8fc0a3c2b4025b08d1a2c8b03696e8fb7f6)

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/liberty)

Reviewed: https://review.openstack.org/312455
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=33b5bb61fe102fc55cb82fc73b852db03419c45b
Submitter: Jenkins
Branch: stable/liberty

commit 33b5bb61fe102fc55cb82fc73b852db03419c45b
Author: Ian Pilcher <email address hidden>
Date: Mon May 2 16:21:55 2016 -0500

    Disable VIPs before stopping cluster during version upgrade

    If "pcs cluster stop --all" is executed on a controller that
    happens to have a VIP on the internal network, pcs may use the
    VIP as the source address for communication with another cluster
    node. When pacemaker is stopped this VIP goes away, and pcs never
    receives a response from the other node. This causes pcs to hang
    indefinitely; eventually the upgrade times out and fails.

    Disabling the VIPs before stopping the cluster avoids this
    situation.

    Change-Id: I6bc59120211af28456018640033ce3763c373bbb
    Closes-Bug: 1577570
    (cherry picked from commit 6e65c8fc0a3c2b4025b08d1a2c8b03696e8fb7f6)

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/tripleo-heat-templates 5.0.0.0b1

This issue was fixed in the openstack/tripleo-heat-templates 5.0.0.0b1 development milestone.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/tripleo-heat-templates 2.1.0

This issue was fixed in the openstack/tripleo-heat-templates 2.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

This issue was fixed in the openstack/tripleo-heat-templates 2.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.