InstanceHA FFU Fixes

Bug #1888398 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Michele Baldessari

Bug Description

InstanceHA FFU needs a number of patches to work correctly. It's mainly around the fact that puppet-pacemaker does not deal correctly when the core cluster nodes are on centos/rhel8 (so they use the new fancy pcs method to authorize remotes) and the compute nodes are on rhel7.

This LP is to track the fixes needed for this to work.

Changed in tripleo:
assignee: nobody → Michele Baldessari (michele)
Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
milestone: victoria-1 → victoria-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (master)

Reviewed: https://review.opendev.org/741231
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=c1e9447998c635a825529cff99b99ed13a4e89d7
Submitter: Zuul
Branch: master

commit c1e9447998c635a825529cff99b99ed13a4e89d7
Author: Michele Baldessari <email address hidden>
Date: Wed Jul 15 16:45:35 2020 +0200

    Make sure python3-novaclient is installed before creating fence_compute

    This has been observed during an IHA FFU process. Namely
    after the OS upgrade by LEAPP the fence_compute resource will
    fail starting because python3-novaclient is not installed.
    Normally this is taken care of by rpm dependencies, but
    fence_compute is buggy and does not explicitely have that
    dep (https://bugzilla.redhat.com/show_bug.cgi?id=1857247)

    So we need to make sure the package is installed before
    creating the resource.

    Tested this on four consecutive successful IHA FFU runs
    and it worked okay.

    Related-Bug: #1888398

    Change-Id: I6816d414409da3748b2e341ec05ebcad86ad8fd1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/741176
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=5cb4565c016d4de48958f7cb9cffce5b685290eb
Submitter: Zuul
Branch: master

commit 5cb4565c016d4de48958f7cb9cffce5b685290eb
Author: Michele Baldessari <email address hidden>
Date: Wed Jul 15 14:59:48 2020 +0200

    Use pcs 0.9 style authkey/remotes when doing an upgrade

    We leverage the _override keys for pacemaker and pacemaker_remote
    services in order to decide if we should use the old way of managing
    remotes (managed authkey and pcs 0.9 way of doing remotes).

    The pacemaker_remote override keys are introduced via
    https://review.opendev.org/741610

    Related-Bug: #1888398

    Change-Id: Iad23663d6c98e4fd3a507980638870e0ad0cee45

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/741610
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=2f460accb9800b2232c7cf783907b78bb94d98ef
Submitter: Zuul
Branch: master

commit 2f460accb9800b2232c7cf783907b78bb94d98ef
Author: Michele Baldessari <email address hidden>
Date: Fri Jul 17 12:13:03 2020 +0200

    pcmk_remote FFU support for Instance HA

    The idea here is to set an override key like the following
    during the upgrade:
    hiera -c /etc/puppet/hiera.yaml pacemaker_remote_short_node_names_override
    ["compute-0"]

    So that puppet-tripleo can detect the key and act accordingly knowning
    that it is being upgraded (main reason is that the authkey management
    for remotes needs to be special-cased in puppet).
    Once the upgrade is completed we remove the key in post_upgrade_tasks.
    Tested this on a number of runs and confirmed that:
    - The IHA FFU completed correctly on controlplane + computes
    - Tempest still works after the FFU
    - The override keys are correctly removed at the end of the upgrade
      of each compute node

    Closes-Bug: #1888398

    Change-Id: I8bc42fb758a333adc9cd65602b44fabee6fc4041

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/746147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/746149

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/746150

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/746152

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (stable/ussuri)

Reviewed: https://review.opendev.org/746149
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=16bb6f5d7f24e4686c78e54bb5072efc2603086a
Submitter: Zuul
Branch: stable/ussuri

commit 16bb6f5d7f24e4686c78e54bb5072efc2603086a
Author: Michele Baldessari <email address hidden>
Date: Wed Jul 15 14:59:48 2020 +0200

    Use pcs 0.9 style authkey/remotes when doing an upgrade

    We leverage the _override keys for pacemaker and pacemaker_remote
    services in order to decide if we should use the old way of managing
    remotes (managed authkey and pcs 0.9 way of doing remotes).

    The pacemaker_remote override keys are introduced via
    https://review.opendev.org/741610

    Related-Bug: #1888398

    Change-Id: Iad23663d6c98e4fd3a507980638870e0ad0cee45
    (cherry picked from commit 5cb4565c016d4de48958f7cb9cffce5b685290eb)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/ussuri)

Reviewed: https://review.opendev.org/746147
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=83874163376e0cc7fe88071316c04fd3a0d62609
Submitter: Zuul
Branch: stable/ussuri

commit 83874163376e0cc7fe88071316c04fd3a0d62609
Author: Michele Baldessari <email address hidden>
Date: Fri Jul 17 12:13:03 2020 +0200

    pcmk_remote FFU support for Instance HA

    The idea here is to set an override key like the following
    during the upgrade:
    hiera -c /etc/puppet/hiera.yaml pacemaker_remote_short_node_names_override
    ["compute-0"]

    So that puppet-tripleo can detect the key and act accordingly knowning
    that it is being upgraded (main reason is that the authkey management
    for remotes needs to be special-cased in puppet).
    Once the upgrade is completed we remove the key in post_upgrade_tasks.
    Tested this on a number of runs and confirmed that:
    - The IHA FFU completed correctly on controlplane + computes
    - Tempest still works after the FFU
    - The override keys are correctly removed at the end of the upgrade
      of each compute node

    Closes-Bug: #1888398

    Change-Id: I8bc42fb758a333adc9cd65602b44fabee6fc4041
    (cherry picked from commit 2f460accb9800b2232c7cf783907b78bb94d98ef)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/746150
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=4b283ae9ea4e34d78c597b5b4c0e11656f828c93
Submitter: Zuul
Branch: stable/train

commit 4b283ae9ea4e34d78c597b5b4c0e11656f828c93
Author: Michele Baldessari <email address hidden>
Date: Fri Jul 17 12:13:03 2020 +0200

    pcmk_remote FFU support for Instance HA

    The idea here is to set an override key like the following
    during the upgrade:
    hiera -c /etc/puppet/hiera.yaml pacemaker_remote_short_node_names_override
    ["compute-0"]

    So that puppet-tripleo can detect the key and act accordingly knowning
    that it is being upgraded (main reason is that the authkey management
    for remotes needs to be special-cased in puppet).
    Once the upgrade is completed we remove the key in post_upgrade_tasks.
    Tested this on a number of runs and confirmed that:
    - The IHA FFU completed correctly on controlplane + computes
    - Tempest still works after the FFU
    - The override keys are correctly removed at the end of the upgrade
      of each compute node

    NB: Backport to train had a small conflict due to puppet include having
        :: as a prefix
    Closes-Bug: #1888398

    Change-Id: I8bc42fb758a333adc9cd65602b44fabee6fc4041
    (cherry picked from commit 2f460accb9800b2232c7cf783907b78bb94d98ef)
    (cherry picked from commit 83874163376e0cc7fe88071316c04fd3a0d62609)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (stable/train)

Reviewed: https://review.opendev.org/746152
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=7d21ae36fab8edf250eaa971e61e1a4dde123380
Submitter: Zuul
Branch: stable/train

commit 7d21ae36fab8edf250eaa971e61e1a4dde123380
Author: Michele Baldessari <email address hidden>
Date: Wed Jul 15 14:59:48 2020 +0200

    Use pcs 0.9 style authkey/remotes when doing an upgrade

    We leverage the _override keys for pacemaker and pacemaker_remote
    services in order to decide if we should use the old way of managing
    remotes (managed authkey and pcs 0.9 way of doing remotes).

    The pacemaker_remote override keys are introduced via
    https://review.opendev.org/741610

    Related-Bug: #1888398

    Change-Id: Iad23663d6c98e4fd3a507980638870e0ad0cee45
    (cherry picked from commit 5cb4565c016d4de48958f7cb9cffce5b685290eb)
    (cherry picked from commit 16bb6f5d7f24e4686c78e54bb5072efc2603086a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/746653

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/746654

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (stable/ussuri)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: stable/ussuri
Review: https://review.opendev.org/746653

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (stable/train)

Reviewed: https://review.opendev.org/746654
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=f28b6907c799731549ece3011e5e2920201b457d
Submitter: Zuul
Branch: stable/train

commit f28b6907c799731549ece3011e5e2920201b457d
Author: Michele Baldessari <email address hidden>
Date: Wed Jul 15 16:45:35 2020 +0200

    Make sure python3-novaclient is installed before creating fence_compute

    This has been observed during an IHA FFU process. Namely
    after the OS upgrade by LEAPP the fence_compute resource will
    fail starting because python3-novaclient is not installed.
    Normally this is taken care of by rpm dependencies, but
    fence_compute is buggy and does not explicitely have that
    dep (https://bugzilla.redhat.com/show_bug.cgi?id=1857247)

    So we need to make sure the package is installed before
    creating the resource.

    Tested this on four consecutive successful IHA FFU runs
    and it worked okay.

    NB: Cherry-pick not 100% due to relative class names

    Related-Bug: #1888398

    Change-Id: I6816d414409da3748b2e341ec05ebcad86ad8fd1
    (cherry picked from commit c1e9447998c635a825529cff99b99ed13a4e89d7)
    (cherry picked from commit c3e86ea6dc0b2f57af12a088c2086e6fc70a95b0)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (stable/ussuri)

Reviewed: https://review.opendev.org/746653
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=c3e86ea6dc0b2f57af12a088c2086e6fc70a95b0
Submitter: Zuul
Branch: stable/ussuri

commit c3e86ea6dc0b2f57af12a088c2086e6fc70a95b0
Author: Michele Baldessari <email address hidden>
Date: Wed Jul 15 16:45:35 2020 +0200

    Make sure python3-novaclient is installed before creating fence_compute

    This has been observed during an IHA FFU process. Namely
    after the OS upgrade by LEAPP the fence_compute resource will
    fail starting because python3-novaclient is not installed.
    Normally this is taken care of by rpm dependencies, but
    fence_compute is buggy and does not explicitely have that
    dep (https://bugzilla.redhat.com/show_bug.cgi?id=1857247)

    So we need to make sure the package is installed before
    creating the resource.

    Tested this on four consecutive successful IHA FFU runs
    and it worked okay.

    Related-Bug: #1888398

    Change-Id: I6816d414409da3748b2e341ec05ebcad86ad8fd1
    (cherry picked from commit c1e9447998c635a825529cff99b99ed13a4e89d7)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.4.0

This issue was fixed in the openstack/tripleo-heat-templates 11.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.