When doing upgrade recoverable checks are leaving the cluster in a unknown state

Bug #1614907 reported by Sofer Athlan-Guyot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Sofer Athlan-Guyot

Bug Description

Hi,

when doing the upgrade, numerous static checks are done during the major pacemaker upgrade step. They arrive in the script at various moments, like the check on rpm-python package, the disk size left of the bootstrap node and so on.

All those checks, if they fail, leave the cluster in more or less an unknown state. One has to go to the controller check what happen and put the cluster back into shape, fix the detected error and then maybe be able to upgrade again.

This is less than optimal situation.

A better way would be for all those tests to happen at the beginning of the upgrade. Then the operator would only have to fix the detected issue and re-run the upgrade again.

Tags: upgrade-bugs
Revision history for this message
Sofer Athlan-Guyot (sofer-athlan-guyot) wrote :

After re-reading the code again, they all happen before any serious change. But it would be nice to refactor them to make it obvious.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/357750

Changed in tripleo:
assignee: nobody → Sofer Athlan-Guyot (sofer-athlan-guyot)
status: New → In Progress
Changed in tripleo:
milestone: none → newton-3
importance: Undecided → High
Steven Hardy (shardy)
Changed in tripleo:
milestone: newton-3 → newton-rc1
tags: removed: update-bugs
Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/357750
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=575e42b0287e37d3ef261c040fb3d331d3419801
Submitter: Jenkins
Branch: master

commit 575e42b0287e37d3ef261c040fb3d331d3419801
Author: Sofer Athlan-Guyot <email address hidden>
Date: Thu Aug 25 11:58:56 2016 +0200

    Refactor upgrade checks.

    We make it clear that recoverable checks happen before starting the
    upgrade to be able to run the upgrade after the offending error has been
    manually corrected.

    Add new check for the pcsd cluster status.

    Add new check for galera password file: BZ 1357112

    Closes-Bug: 1614907
    Change-Id: If736c79121e1ffe0eaeb814bdb73ccbc0b64edcd

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.