AIO-SX Upgrade: upgrade failed due to nginx failure

Bug #1989018 reported by Reinildes Oliveira
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Reinildes Oliveira

Bug Description

Brief Description
--------------------------------------------------

Centos 22.06 to Centos 22.12 upgrade failed on AIO-SX due to ansible failure

Severity
--------------------------------------------------

Critical: System is not useable

Steps to Reproduce
--------------------------------------------------

1.Install 22.06 load AIO-SX system
2.Follow upgrade procedure to upgrade AIO-SX system to 22.12 centos load.

During the ansible execution, there was an error.

Expected Behavior
--------------------------------------------------

Ansible execution successful with no errors to continue the upgrade.

Actual Behavior
--------------------------------------------------
As the description says Centos 22.06 to Centos 22.12 upgrade failed on AIO-SX due to ansible failure as below .
2022-08-18 19:33:17,915 p=13415 u=sysadmin | fatal: [localhost]: FAILED! => changed=true
cmd: kubectl wait pod -n kube-system -l $(kubectl get service -n kube-system ic-nginx-ingress-ingress-nginx-controller-admission -o jsonpath="
{.spec.selector}

" | tr -d "{}\"" | tr ":" "=") --for=condition=Ready --timeout=60s
delta: '0:00:00.107273'
end: '2022-08-18 19:33:17.896974'
msg: non-zero return code
rc: 1
start: '2022-08-18 19:33:17.789701'
stderr: 'error: no matching resources found'
stderr_lines:

    'error: no matching resources found'
    stdout: ''
    stdout_lines: <omitted>

Reproducibility
--------------------------------------------------

3 of 3, 100%

System Configuration
--------------------------------------------------

AIO-SX system

Last Pass
--------------------------------------------------

8/11/2022

Timestamp/Logs
--------------------------------------------------

022-08-18 19:33:13,817 p=13415 u=sysadmin | Thursday 18 August 2022 19:33:13 +0000 (0:00:00.034) 0:14:21.496 *******
2022-08-18 19:33:14,001 p=13415 u=sysadmin | changed: [localhost]
2022-08-18 19:33:14,007 p=13415 u=sysadmin | TASK [common/armada-helm : If on system restore mode, kill ingress validating webhook pod so it can be recreated] ****************************************************************** **********************************************************************
2022-08-18 19:33:14,007 p=13415 u=sysadmin | Thursday 18 August 2022 19:33:14 +0000 (0:00:00.189) 0:14:21.686 *******
2022-08-18 19:33:17,655 p=13415 u=sysadmin | changed: [localhost]
2022-08-18 19:33:17,661 p=13415 u=sysadmin | TASK [common/armada-helm : Check ingress validating webhook service and pod status] ************************************************************************************************ **********************************************************************
2022-08-18 19:33:17,662 p=13415 u=sysadmin | Thursday 18 August 2022 19:33:17 +0000 (0:00:03.654) 0:14:25.341 *******
2022-08-18 19:33:17,915 p=13415 u=sysadmin | fatal: [localhost]: FAILED! => changed=true
cmd: kubectl wait pod -n kube-system -l $(kubectl get service -n kube-system ic-nginx-ingress-ingress-nginx-controller-admission -o jsonpath="\{.spec.selector}

" | tr -d "{}\"" | tr ":" "=") --for=condition=Ready --timeout=60s
delta: '0:00:00.107273'
end: '2022-08-18 19:33:17.896974'
msg: non-zero return code
rc: 1
start: '2022-08-18 19:33:17.789701'
stderr: 'error: no matching resources found'
stderr_lines:
 - 'error: no matching resources found'
stdout: ''
stdout_lines: <omitted>
2022-08-18 19:33:17,916 p=13415 u=sysadmin | PLAY RECAP ************************************************************************************************************************************************************************* **********************************************************************
2022-08-18 19:33:17,916 p=13415 u=sysadmin | localhost : ok=442 changed=227 unreachable=0 failed=1Alarm

Alarms
--------------------------------------------------
N/A

Test Activity
--------------------------------------------------
Regression

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/856319
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/dfd15dd35a719a4b3b428044775e275163eba6b6
Submitter: "Zuul (22348)"
Branch: master

commit dfd15dd35a719a4b3b428044775e275163eba6b6
Author: Rei Oliveira <email address hidden>
Date: Wed Sep 7 16:11:07 2022 -0300

    Fix AIO-SX upgrade error on nginx wait for pod

    This issue seems to be happening because the execution is too fast
    and the pod does not exist when the wait command is executed, resulting
    in a 'resource not found' error.

    The 'kubectl wait' is an experimental feature and there is a very
    similar bug reported by the community in [1], where 'wait' fails to
    wait for non-existent resources.

    This commit addresses the issue with ansible 'retries and delay'.

    [1] https://github.com/kubernetes/kubernetes/issues/83242

    Test Plan:

    PASS: Run aio-sx upgrade and execute upgrade_platform.yml playbook
          with success.
    PASS: Fresh install and successful run of the bootstrap playbook.

    Closes-Bug: 1989018
    Signed-off-by: Rei Oliveira <email address hidden>
    Change-Id: I1b077ac85fa912356ef4e4c9f05b417469296ade

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
assignee: nobody → Reinildes Oliveira (rjosemat)
tags: added: stx.8.0 stx.update
summary: - AIO-SX Upgrade: Centos 22.06 to Centos 22.12 upgrade failed due to
- nginx failure
+ AIO-SX Upgrade: upgrade failed due to nginx failure
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.