invalid call to pcs to restart HA resource during minor update

Bug #1931500 reported by Damien Ciabrini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Undecided
Damien Ciabrini

Bug Description

Observed on a downstream train deployment.

When doing a minor update of an HA overcloud while some HA resources are in failed state in pacemaker (i.e. some replicas of the resource are not running on some controller nodes due to some errors), the script in charge of updating the HA resource locally tries to cleanup the failed resource locally with an invalid pcs call:

       "b\"Wed Jun 9 16:45:29 UTC 2021: openstack-cinder-volume is currently not running on 'controller-0', cleaning up its state to restart it if necessary\\nWed Jun 9 16:45:30 UTC 2021: Wait until openstack-cinder-volume is restarted anywhere in the cluster in state Started\\nWed Jun 9 16:45:30 UTC 2021: Will probe resource state with the following XPath pattern: //bundle[@id='openstack-cinder-volume']//resource\\nWed Jun 9 16:45:31 UTC 2021: openstack-cinder-volume successfully restarted\\n\"",
        "b\"Error: Specified option '--node' is not supported in this command\\n\"",
        "Completed $ podman run --name cinder_volume_restart_bundle --label config_id=tripleo_step5 --label container_name=cinder_volume_restart_bundle --label managed_by=tripleo-ControllerOpenstack --label config_data={\"command\": \"/pacemaker_restart_bundle.sh cinder_volume openstack-cinder-volume openstack-cinder-volume _ Started\", \"config_volume\": \"cinder\", \"detach\": false, \"environment\": {\"TRIPLEO_MINOR_UPDATE\": \"\", \"TRIPLEO_CONFIG_HASH\":
\"773f0006d8da11eb69451d0e2d851517\"}, \"image\": \"undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-volume:16.1_20210602.1\", \"ipc\": \"host\", \"net\": \"host\", \"start_order\": 2, \"user\": \"root\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ipa/ca.crt:/etc/ipa/ca.crt:ro\", \"/var/lib/container-config-scripts/pacemaker_restart_bundle.sh:/pacemaker_restart_bundle.sh:ro\", \"/var/lib/container-config-scripts/pacemaker_wait_bundle.sh:/pacemaker_wait_bundle.sh:ro\", \"/dev/shm:/dev/shm:rw\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/config-data/puppet-generated/cinder:/var/lib/kolla/config_files/src:ro\"]} --conmon-pidfile=/var/run/cinder_volume_restart_bundle.pid --log-driver k8s-file --log-opt path=/var/log/containers/stdouts/cinder_volume_restart_bundle.log --env=TRIPLEO_CONFIG_HASH=773f0006d8da11eb69451d0e2d851517 --env=TRIPLEO_MINOR_UPDATE --net=host --ipc=host --user=root --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ipa/ca.crt:/etc/ipa/ca.crt:ro --volume=/var/lib/container-config-scripts/pacemaker_restart_bundle.sh:/pacemaker_restart_bundle.sh:ro --volume=/var/lib/container-config-scripts/pacemaker_wait_bundle.sh:/pacemaker_wait_bundle.sh:ro --volume=/dev/shm:/dev/shm:rw --volume=/etc/puppet:/etc/puppet:ro --volume=/var/lib/config-data/puppet-generated/cinder:/var/lib/kolla/config_files/src:ro undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhosp16-openstack-cinder-volume:16.1_20210602.1 /pacemaker_restart_bundle.sh cinder_volume openstack-cinder-volume openstack-cinder-volume _ Started",
        "stdout: Wed Jun 9 16:45:29 UTC 2021: openstack-cinder-volume is currently not running on 'controller-0', cleaning up its state to restart it if necessary",
        "",
        "stderr: Error: Specified option '--node' is not supported in this command"
    ]
}

In that case, the minor update continues in sequence without failing, but the resource is actually not restarted, so the minor update isn't given a chance to recover the failed resource as expected.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)
Changed in tripleo:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/795704
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/1662600e6e6e8fedbf096b279e16eabd7b3c6eea
Submitter: "Zuul (22348)"
Branch: master

commit 1662600e6e6e8fedbf096b279e16eabd7b3c6eea
Author: Damien Ciabrini <email address hidden>
Date: Wed Jun 9 23:31:52 2021 +0200

    HA minor update: fix bad pcs invocation

    When a HA resource is in failed stated, the minor update
    should normally try to restart it but the associated
    pcs invocation is currently invalid, so the resource never
    gets a chance to be restarted.

    Use the right pcs call to fix this minor update use case.

    Change-Id: Iaf85807d067898bbab6d76ab40bc070e845a8b38
    Closes-Bug: #1931500

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/795802

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/795802
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/d03517b6144373fea1fc0e7f11bf4a39cecb00b6
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit d03517b6144373fea1fc0e7f11bf4a39cecb00b6
Author: Damien Ciabrini <email address hidden>
Date: Wed Jun 9 23:31:52 2021 +0200

    HA minor update: fix bad pcs invocation

    When a HA resource is in failed stated, the minor update
    should normally try to restart it but the associated
    pcs invocation is currently invalid, so the resource never
    gets a chance to be restarted.

    Use the right pcs call to fix this minor update use case.

    Change-Id: Iaf85807d067898bbab6d76ab40bc070e845a8b38
    Closes-Bug: #1931500
    (cherry picked from commit 1662600e6e6e8fedbf096b279e16eabd7b3c6eea)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/796078

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/796078
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/f808dac566d342778b88ce2de1fdcdf0853f71e7
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit f808dac566d342778b88ce2de1fdcdf0853f71e7
Author: Damien Ciabrini <email address hidden>
Date: Wed Jun 9 23:31:52 2021 +0200

    HA minor update: fix bad pcs invocation

    When a HA resource is in failed stated, the minor update
    should normally try to restart it but the associated
    pcs invocation is currently invalid, so the resource never
    gets a chance to be restarted.

    Use the right pcs call to fix this minor update use case.

    Change-Id: Iaf85807d067898bbab6d76ab40bc070e845a8b38
    Closes-Bug: #1931500
    (cherry picked from commit 1662600e6e6e8fedbf096b279e16eabd7b3c6eea)
    (cherry picked from commit d03517b6144373fea1fc0e7f11bf4a39cecb00b6)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/ussuri)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 15.0.0

This issue was fixed in the openstack/tripleo-heat-templates 15.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/796214
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/d146946546b8ce14b30b75d95d7c6695735000c8
Submitter: "Zuul (22348)"
Branch: stable/train

commit d146946546b8ce14b30b75d95d7c6695735000c8
Author: Damien Ciabrini <email address hidden>
Date: Wed Jun 9 23:31:52 2021 +0200

    HA minor update: fix bad pcs invocation

    When a HA resource is in failed stated, the minor update
    should normally try to restart it but the associated
    pcs invocation is currently invalid, so the resource never
    gets a chance to be restarted.

    Use the right pcs call to fix this minor update use case.

    Change-Id: Iaf85807d067898bbab6d76ab40bc070e845a8b38
    Closes-Bug: #1931500
    (cherry picked from commit 1662600e6e6e8fedbf096b279e16eabd7b3c6eea)
    (cherry picked from commit d03517b6144373fea1fc0e7f11bf4a39cecb00b6)
    (cherry picked from commit f808dac566d342778b88ce2de1fdcdf0853f71e7)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/796207
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/76c2e05dbb3256e962913fe2a6037fd86c5519b2
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 76c2e05dbb3256e962913fe2a6037fd86c5519b2
Author: Damien Ciabrini <email address hidden>
Date: Wed Jun 9 23:31:52 2021 +0200

    HA minor update: fix bad pcs invocation

    When a HA resource is in failed stated, the minor update
    should normally try to restart it but the associated
    pcs invocation is currently invalid, so the resource never
    gets a chance to be restarted.

    Use the right pcs call to fix this minor update use case.

    Change-Id: Iaf85807d067898bbab6d76ab40bc070e845a8b38
    Closes-Bug: #1931500
    (cherry picked from commit 1662600e6e6e8fedbf096b279e16eabd7b3c6eea)
    (cherry picked from commit d03517b6144373fea1fc0e7f11bf4a39cecb00b6)
    (cherry picked from commit f808dac566d342778b88ce2de1fdcdf0853f71e7)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 14.2.0

This issue was fixed in the openstack/tripleo-heat-templates 14.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 13.4.0

This issue was fixed in the openstack/tripleo-heat-templates 13.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 12.4.5

This issue was fixed in the openstack/tripleo-heat-templates 12.4.5 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers