Major upgrade M->N can fail due to low cluster sync timeout

Bug #1597506 reported by Damien Ciabrini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Damien Ciabrini
Mitaka
Fix Released
Undecided
Unassigned

Bug Description

Since the Liberty release, the number of services managed by pacemaker
on HA Overcloud has increased.

This has an impact on major_upgrade_controller_pacemaker_1.sh, where
cluster timeout value tuned for older releases is now becoming too
low.

If the cluster on the overcloud fails to stop within the specified
timeout threshold, the major upgrade ends up in FAILED state.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/335666

Changed in tripleo:
assignee: nobody → Damien Ciabrini (dciabrin)
status: New → In Progress
Changed in tripleo:
status: In Progress → Confirmed
importance: Undecided → High
milestone: none → newton-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/335666
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=017334bbb5978ed9a9c06f4595506612287c1b14
Submitter: Jenkins
Branch: master

commit 017334bbb5978ed9a9c06f4595506612287c1b14
Author: Damien Ciabrini <email address hidden>
Date: Wed Jun 29 22:36:34 2016 +0200

    Increase cluster sync timeout for M->N major upgrades

    Since the Liberty release, the number of services managed by pacemaker
    on HA Overcloud has increased. This has an impact on
    major_upgrade_controller_pacemaker_1.sh, where cluster sync timeout
    value tuned for older releases is now becoming too low.

    Raise the cluster sync timeout value to a sensible limit to
    give pacemaker enough time to stop the cluster during major upgrade.

    Change-Id: I821d354ba30ce39134982ba12a82c429faa3ce62
    Closes-Bug: #1597506

Changed in tripleo:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/335876

tags: added: mitaka-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/mitaka)

Reviewed: https://review.openstack.org/335876
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f26a2ed0778ecce7e404c4df9eeb3566719292ab
Submitter: Jenkins
Branch: stable/mitaka

commit f26a2ed0778ecce7e404c4df9eeb3566719292ab
Author: Damien Ciabrini <email address hidden>
Date: Wed Jun 29 22:36:34 2016 +0200

    Increase cluster sync timeout for M->N major upgrades

    Since the Liberty release, the number of services managed by pacemaker
    on HA Overcloud has increased. This has an impact on
    major_upgrade_controller_pacemaker_1.sh, where cluster sync timeout
    value tuned for older releases is now becoming too low.

    Raise the cluster sync timeout value to a sensible limit to
    give pacemaker enough time to stop the cluster during major upgrade.

    Change-Id: I821d354ba30ce39134982ba12a82c429faa3ce62
    Closes-Bug: #1597506
    (cherry picked from commit 017334bbb5978ed9a9c06f4595506612287c1b14)

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/tripleo-heat-templates 5.0.0.0b2

This issue was fixed in the openstack/tripleo-heat-templates 5.0.0.0b2 development milestone.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/tripleo-heat-templates 2.1.0

This issue was fixed in the openstack/tripleo-heat-templates 2.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

This issue was fixed in the openstack/tripleo-heat-templates 2.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.