[series-upgrade] "Series upgrade OpenStack" is wrong with respect to which unit to upgrade first

Bug #1934764 reported by Alex Kavanagh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Charms Deployment Guide
Fix Released
High
Peter Matulis

Bug Description

In HA, the guide indicates in the "Generalised OpenStack series upgrade" section:

The steps are as follows:

    Set the default series for the principal application and ensure the same has been done to the model.

    If hacluster is used, pause the hacluster units not associated with the principal leader machine.

    Pause the principal non-leader units.

    Perform a series upgrade on the principal leader machine.

If the operator does this then the service will be taken off-line.

In reality, the remaining machine (that is not paused) has the VIP and is continuing to provide a service. The operator should upgrade the two paused machines first, and when they are both on-line, one of them will claim the VIP. The 3rd machine's principle can then be paused, and upgraded.

In this way, service can be maintained during an upgrade.

Revision history for this message
Peter Matulis (petermatulis) wrote :

Alex, why do you say that after bringing the paused (and now upgraded) units back online one of them will claim the VIP? Wouldn't that happen only once:

1. the hacluster units associated with the now-upgraded principle units are resumed

and

2. the hacluster unit associated with the remaining (non-upgraded) unit is paused

Then the latter unit can be paused and upgraded?

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Peter, thanks for picking this up. So referring to your two questions:

> the hacluster units associated with the now-upgraded principle units are resumed

So with series-upgrade, the post-series-upgrade hook with hacluster (and the principle) automatically resumes the unit. When 2 units have been upgraded, one of them automatically claims the VIP.

> the hacluster unit associated with the remaining (non-upgraded) unit is paused

Nope, because hacluster automatically takes the cluster offline when any of the hacluster units runs a pre-series-upgrade hook. So at that point, no unit is claiming the VIP, it just 'stays' where it is.

i.e. when the first unit it paused, the entire cluster is disabled and the VIP stays where it is.

So the order if operations (with hacluster) is:

a) Pause any unit + hacluster - if it had the VIP, it's handed off to one of the other units.
b) Pause another unit + hacluster - if it got the VIP, it is passed to the remaining unit.
c) pre-series-upgrade on a paused unit - hacluster is disabled for the application on ALL units. No VIP transfers will take place until two new upgraded units are available. Hope the 3rd unit stays up.

Note the remaining unit doesn't need to be the 'leader'. hacluster ensures that an unpaused unit gets the VIP.

The key information here is that the pre-series-upgrade hook on hacluster DISABLES hacluster for all units (regardless of whether they are paused or not). Therefore, for the duration of the series upgrade, the VIP can't move, and it requires two units to be upgraded before the VIP can move again.

Also, there is no way to series upgrade a single unit and leave 2 units providing the service, as there's no (charm) method of moving the VIP between remaining units.

Hope that helps.

Changed in charm-deployment-guide:
assignee: nobody → Peter Matulis (petermatulis)
importance: Undecided → High
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-deployment-guide (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-deployment-guide (master)

Reviewed: https://review.opendev.org/c/openstack/charm-deployment-guide/+/816395
Committed: https://opendev.org/openstack/charm-deployment-guide/commit/3f969eff1f4ef049ed8cd862000a12475b71cc6e
Submitter: "Zuul (22348)"
Branch: master

commit 3f969eff1f4ef049ed8cd862000a12475b71cc6e
Author: Peter Matulis <email address hidden>
Date: Tue Nov 2 15:14:45 2021 -0400

    Fix lp1934764 - series upgrade order

    Closes-Bug: #1934764
    Change-Id: I06f72fc03c5a65f89a4b01f783beed4450e502d3

Changed in charm-deployment-guide:
status: In Progress → Fix Released
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Somewhere along the line, this broke rather badly. It turns out it's a bit random as to whether the VIP will stay working with the remaining machine that bringing the API service down during a series upgrade. It would appear there is no way of guaranteeing API availability during a series upgrade, and planned downtime is required.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

@ajkvanagh, is this largely due to the fact that the underlying pacemaker services are a major version upgrade which does not support rolling cluster upgrades?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to charm-deployment-guide (master)

Reviewed: https://review.opendev.org/c/openstack/charm-deployment-guide/+/818848
Committed: https://opendev.org/openstack/charm-deployment-guide/commit/a655378cca96f8f2e518eca6fa5a32cf88a4626b
Submitter: "Zuul (22348)"
Branch: master

commit a655378cca96f8f2e518eca6fa5a32cf88a4626b
Author: Peter Matulis <email address hidden>
Date: Tue Nov 16 16:26:43 2021 -0500

    Review series upgrades

    Review and correct the upgrade-series page.

    Make miscellaneous improvements to various places
    for the sake of consistency.

    Also closes a doc bug and reverts changes made due to
    a second doc bug.

    Closes-Bug: #1838041
    Related-Bug: #1934764
    Change-Id: I88692573e8ae50cd77d3872b40361b464b8e0f19

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.