M/N upgrades - A few major-upgrade issues

Bug #1627490 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Michele Baldessari

Bug Description

We have a bunch of smaller problems in the major-upgrade logic currently:

1. We now explicitly disable/stop and then remove the resources that are moving to systemd. We do this because we want to make sure they are all stopped before doing a yum upgrade, which otherwise would take ages due to rabbitmq and galera being down. It is best if we do this via pcs while we do the HA Full -> HA NG migration because it is simpler to make sure all the services are stopped at that stage. For extra safety we can still do a check by hand. By doing it via pacemaker we have the guarantee that all the migrated services are down already when we stop the cluster (which happens to be a syncronization point between all controller nodes). That way we can be certain that they
are all down on all nodes before starting the yum upgrade process.

2. We actually need to start the systemd services in major_upgrade_controller_pacemaker_2.sh and not stop them.

3. We need to use the proper bash variable name

4. Use is_bootstrap_node everywhere to make the code more consistent

Tags: upgrade
Revision history for this message
Michele Baldessari (michele) wrote :

1. Another reason it is best to stop them via pcs is that if they are stopped via systemd on non-bootstrap nodes, before the corresponding pcs resource is deleted, check_resource_systemd will barf with something like:

Fri Sep 23 16:54:45 UTC 2016 1a9879f7-3e6c-457f-8e1f-3a9a16d52193 tripleo-upgrade overcloud-controller-1 Going to systemctl stop httpd
Fri Sep 23 16:54:46 UTC 2016 1a9879f7-3e6c-457f-8e1f-3a9a16d52193 tripleo-upgrade overcloud-controller-1 Going to check_resource_systemd for httpd to be stopped\nERROR - httpd not found to be systemd managed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/376009

Changed in tripleo:
assignee: nobody → Michele Baldessari (michele)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/376009
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f9e6a26f32aea4d3c40178f87b61efb924f81f97
Submitter: Jenkins
Branch: master

commit f9e6a26f32aea4d3c40178f87b61efb924f81f97
Author: Michele Baldessari <email address hidden>
Date: Sun Sep 25 14:10:31 2016 +0200

    A few major-upgrade issues

    This commit does the following:
    1. We now explicitly disable/stop and then remove the resources that are
       moving to systemd. We do this because we want to make sure they are all
       stopped before doing a yum upgrade, which otherwise would take ages due
       to rabbitmq and galera being down. It is best if we do this via pcs
       while we do the HA Full -> HA NG migration because it is simpler to make
       sure all the services are stopped at that stage. For extra safety we can
       still do a check by hand. By doing it via pacemaker we have the
       guarantee that all the migrated services are down already when we stop
       the cluster (which happens to be a syncronization point between all
       controller nodes). That way we can be certain that they are all down on
       all nodes before starting the yum upgrade process.

    2. We actually need to start the systemd services in
       major_upgrade_controller_pacemaker_2.sh and not stop them.

    3. We need to use the proper bash variable name

    4. Use is_bootstrap_node everywhere to make the code more consistent

    Change-Id: Ic565c781b80357bed9483df45a4a94ec0423487c
    Closes-Bug: #1627490

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 5.0.0.0rc2

This issue was fixed in the openstack/tripleo-heat-templates 5.0.0.0rc2 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.