undercloud-upgrade-wallaby fails (post-upgrade) deploy tasks - missing swift container

Bug #1935961 reported by Marios Andreou
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

At [1][2][3] the tripleo-ci-centos-8-undercloud-upgrade-wallaby fails during the upgrade but after upgrade tasks are completed. During the post upgrade deploy tasks step 5 there is the following trace:

 2021-07-08 13:34:11 | 2021-07-08 13:34:11.341074 | fa163ee1-1b2f-aef2-0d3f-0000000029b9 | FATAL | Run kolla_set_configs to copy ring files | undercloud | item=swift_proxy | error={"ansible_loop_var": "item", "changed": true, "cmd": "podman exec -u root swift_proxy /usr/local/bin/kolla_set_configs ", "delta": "0:00:00.254828", "end": "2021-07-08 13:34:11.306745", "failed_when_result": true, "item": "swift_proxy", "msg": "non-zero return code", "rc": 137, "start": "2021-07-08 13:34:11.051917", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

This is potentially a gate blocker as the job is being wired up with [4] and has already merged for puppet-tripleo there [5]

[1] https://d393ab92b65d6ff2eea5-fc707f543607a38fac44776c15f601da.ssl.cf5.rackcdn.com/800186/1/check/tripleo-ci-centos-8-undercloud-upgrade-wallaby/f002075/logs/undercloud/home/zuul/undercloud_upgrade.log
[2] https://0229dac170a36c80720a-0e9efd411d5f516ecd1b5a61c03e35b7.ssl.cf2.rackcdn.com/800011/1/check/tripleo-ci-centos-8-undercloud-upgrade-wallaby/3d191f2/logs/undercloud/home/zuul/undercloud_upgrade.log
[3] https://d3ba52c45612db77e201-20f8b612e9341a4bfb8242ee0493df0d.ssl.cf2.rackcdn.com/793135/3/check/tripleo-ci-centos-8-undercloud-upgrade-wallaby/d7f682e/logs/undercloud/home/zuul/undercloud_upgrade.log
[4] https://review.opendev.org/q/topic:wallaby-upgrade-jobs
[5] https://review.opendev.org/c/openstack/puppet-tripleo/+/793124

Revision history for this message
Marios Andreou (marios-b) wrote :

I suspect this may be a race condition. It has been seen to pass and then fail on the same review, for example [1] pass then [2] fail.

The failing task is from [3] added in [4]

Not clear if this is wallaby only or if it will also affect the master branch.

[1] https://review.opendev.org/c/openstack/puppet-tripleo/+/800186/1#message-74632e92dc45e435ebd78ad090f58224e80e93d9 * tripleo-ci-centos-8-undercloud-upgrade-wallaby https://zuul.opendev.org/t/openstack/build/7a13f3f5c9724d64b1615ebf621b8687 : SUCCESS in 1h 13m 20s

[2] https://review.opendev.org/c/openstack/puppet-tripleo/+/800186/1#message-9a203e19265dc064fbf5cbf66c94bbf42556d2eb * tripleo-ci-centos-8-undercloud-upgrade-wallaby https://zuul.opendev.org/t/openstack/build/f002075c98f84a68ba48c1fb5b573bab : FAILURE in 1h 29m 13s

[3] https://opendev.org/openstack/tripleo-heat-templates/src/commit/a59039d29ca783724e911de0021d67f939e76ad2/deployment/swift/swift-storage-container-puppet.yaml#L695-L696

[4] https://review.opendev.org/q/Ibdd783b484a84c0fdfaac84d892a8ea46be85fde

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ci (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/800620

Revision history for this message
Marios Andreou (marios-b) wrote :

13:50 < opendevreview> Marios Andreou proposed openstack/tripleo-ci master: Temp mark
                       undercloud-upgrade-wallaby non voting for swift bug
                       https://review.opendev.org/c/openstack/tripleo-ci/+/800620

Revision history for this message
Rabi Mishra (rabi) wrote :

Swift should be disabled in wallaby from the undercloud. Sounds like it's enabled[1] as we're the not updating the pre-upgrade undercloud.conf before upgrade and the upgrade script is using '-e /usr/share/openstack-tripleo-heat-templates/environments/undercloud-enable-swift.yaml'[2]

[1] https://0229dac170a36c80720a-0e9efd411d5f516ecd1b5a61c03e35b7.ssl.cf2.rackcdn.com/800011/1/check/tripleo-ci-centos-8-undercloud-upgrade-wallaby/3d191f2/logs/undercloud/home/zuul/undercloud.conf

[2] https://0229dac170a36c80720a-0e9efd411d5f516ecd1b5a61c03e35b7.ssl.cf2.rackcdn.com/800011/1/check/tripleo-ci-centos-8-undercloud-upgrade-wallaby/3d191f2/logs/undercloud/home/zuul/undercloud_upgrade.log

Revision history for this message
Marios Andreou (marios-b) wrote :

this is not a promotion blocker but adding the tag so it can be tracked with cix

tags: added: promotion-blocker
Revision history for this message
Rabi Mishra (rabi) wrote :

It's disabled by default in wallaby https://github.com/openstack/python-tripleoclient/blob/stable/wallaby/tripleoclient/config/undercloud.py#L82

Probably upgrade job has to take care of updating undercloud.conf as well.

Revision history for this message
Marios Andreou (marios-b) wrote :

thanks for comments rabi it makes sense - i did wonder why swift was still enabled here (and failing the job). I wasnt sure if we needed to special case those deploy steps to not execute the failing task, or just not use that service template at all in the deployment (i.e. remove swift completely) which is what we need to do here.

that probably answers why this isn't happening for wallaby-master (at least haven't found examples yet). There is likely handling for that already in the master upgrade_tasks?

Revision history for this message
Rabi Mishra (rabi) wrote :

> There is likely handling for that already in the master upgrade_tasks?

Don't think we've to handle it in upgrade_tasks as someone may want to enable swift in the undercloud intentionally (by enabling it in undercloud.conf).

If we use the defaults(not setting enable_swift=true explicitly in undercloud.conf) it would work fine and there would be no issue for upgrade (i.e enabled in v but disabled in w) . I think, we set it explicitly in CI[1], hence the issue.

[1] https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/extras-common/defaults/main.yml#L163-L168

Don't know how we automated it in the CI for mistral/zaqar etc.

Revision history for this message
Rabi Mishra (rabi) wrote :

I just proposed a revert https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/800636 which would allow us use the defaults for every release in tripleoclient and hence fix the issue? Probably we can test the wallaby upgrade job with it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ci (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ci/+/800620
Committed: https://opendev.org/openstack/tripleo-ci/commit/cea112d77436a7bc998bedbfddee010ed0fade27
Submitter: "Zuul (22348)"
Branch: master

commit cea112d77436a7bc998bedbfddee010ed0fade27
Author: Marios Andreou <email address hidden>
Date: Tue Jul 13 13:46:47 2021 +0300

    Temp mark undercloud-upgrade-wallaby non voting for swift bug

    The undercloud-upgrade-wallaby job is hitting suspected race
    condition from related bug. We should mark it non voting to
    unblock the gate until the bug is fixed.

    Change-Id: I73be2188b104568ced2296642032902830e7a175
    Related-Bug: 1935961

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ci (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/800860

Revision history for this message
Marios Andreou (marios-b) wrote :
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ci (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ci/+/800860
Committed: https://opendev.org/openstack/tripleo-ci/commit/5c17ccca732e2e4c1c1ec070003d6fd09796566f
Submitter: "Zuul (22348)"
Branch: master

commit 5c17ccca732e2e4c1c1ec070003d6fd09796566f
Author: Marios Andreou <email address hidden>
Date: Thu Jul 15 07:49:01 2021 +0000

    Revert "Temp mark undercloud-upgrade-wallaby non voting for swift bug"

    This reverts commit cea112d77436a7bc998bedbfddee010ed0fade27.
    Job is green again [1]
    Reason for revert: related bug fixed see comment #12

    [1] https://zuul.openstack.org/builds?job_name=tripleo-ci-centos-8-undercloud-upgrade-wallaby
    Related-Bug: 1935961

    Change-Id: I2c193ff171ec6576930b006391224ebedf06abb9

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.