StarlingX

Simplex: ansible restore failure on timeout on kube-sriov-cni-ds-amd64-bcl75

Bug #1974051 reported by Luis Eduardo Angelini Marquitti on 2022-05-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Fix Released	Medium	Virginia Martins Perozim

Bug Description

Brief Description
-----------------
Simplex standalone restore failed during the ansible execution. Ansible logs say timeout on waiting for pods/kube-sriov-cni-ds-amd64-bcl75.

Severity
--------
Major

Steps to Reproduce
------------------
Run a backup and try to restore.

Expected Behavior
------------------
Complete the restore.

Actual Behavior
----------------
During the restore, Ansible logs say timeout on waiting for pods/kube-sriov-cni-ds-amd64-bcl75.

Reproducibility
---------------
100% Reproducible

System Configuration
--------------------
AIO-SX

Branch/Pull Time/Commit
-----------------------
Starlingx master

Last Pass
---------
-

Timestamp/Logs
--------------
-

Test Activity
-------------
-

Workaround
----------
-

Tags:

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-05-19: Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/ansible-playbooks/+/842451

Changed in starlingx:
status:	New → In Progress

Ghada Khalil (gkhalil) on 2022-05-20

tags:	added: stx.7.0 stx.update
Changed in starlingx:
importance:	Undecided → Medium
assignee:	nobody → Virginia Martins Perozim (vmperozim)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-06-09: Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/842451
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/4fd0212f96e4c86d508e974f29c7ac80ca839b08
Submitter: "Zuul (22348)"
Branch: master

commit 4fd0212f96e4c86d508e974f29c7ac80ca839b08
Author: Virginia Martins Perozim <email address hidden>
Date: Wed May 18 21:40:41 2022 -0400

Delay wait for kubernetes pods be in ready state

    During the execution of k8s-upgrade-networking tasks as part of
    AIO-SX upgrade, the sriov pod changes its name causing the
    subsequent ansible task that verifies each pod in the
    kube_component_list to fail.

       Example:
       $ kubectl --kubeconfig=/etc/kubernetes/admin.conf
                 rollout restart ds -n kube-system
                 kube-sriov-cni-ds-amd64
       $ kubectl get pods -A
       NAMESPACE NAME READY STATUS
       ...
       kube-system kube-sriov-cni-ds-amd64-k5rk7 0/1 Terminating
       $ date
       Wed May 18 12:25:31 UTC 2022
       kubectl --kubeconfig=/etc/kubernetes/admin.conf wait
               --namespace=kube-system
               --for=condition=Ready pods
               --selector app=sriov-cni
               --field-selector spec.nodeName=controller-0
               --timeout=120s
       error: timed out waiting for the condition on
              pods/kube-sriov-cni-ds-amd64-k5rk7
       $ date
       Wed May 18 12:27:35 UTC 2022

       $ kubectl get pods -A
       NAMESPACE NAME READY STATUS
       ...
       kube-system kube-sriov-cni-ds-amd64-w2qlp 1/1 Running
       $ date
       Wed May 18 12:25:40 UTC 2022 <---- running before timeout

    The issue is resolved by moving a wait task further down which
    ensures the k8s pods have adequate time to be ready for the
    verification task in all 3 cases - fresh install, upgrade and B&R.

Test Plan:

    PASS: AIO-SX upgrade
    PASS: Subcloud upgrade
    PASS: AIO-SX backup and restore
    PASS: AIO-SX system bring up (fresh install)

    Closes-Bug: 1974051
    Signed-off-by: Virginia Martins Perozim <email address hidden>
    Change-Id: I3b80e2ad67221900b1103b7e742d9a5a0586ae2f

Reviewed:  https://review.opendev.org/c/starlingx/ansible-playbooks/+/842451
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/4fd0212f96e4c86d508e974f29c7ac80ca839b08
Submitter: "Zuul (22348)"
Branch:    master

commit 4fd0212f96e4c86d508e974f29c7ac80ca839b08
Author: Virginia Martins Perozim <vmartins@windriver.com>
Date:   Wed May 18 21:40:41 2022 -0400

Delay wait for kubernetes pods be in ready state
    
    During the execution of k8s-upgrade-networking tasks as part of
    AIO-SX upgrade, the sriov pod changes its name causing the
    subsequent ansible task that verifies each pod in the
    kube_component_list to fail.
    
       Example:
       $ kubectl --kubeconfig=/etc/kubernetes/admin.conf
                 rollout restart ds -n kube-system
                 kube-sriov-cni-ds-amd64
       $ kubectl get pods -A
       NAMESPACE    NAME                           READY   STATUS
       ...
       kube-system  kube-sriov-cni-ds-amd64-k5rk7  0/1     Terminating
       $ date
       Wed May 18 12:25:31 UTC 2022
       kubectl --kubeconfig=/etc/kubernetes/admin.conf wait
               --namespace=kube-system
               --for=condition=Ready pods
               --selector app=sriov-cni
               --field-selector spec.nodeName=controller-0
               --timeout=120s
       error: timed out waiting for the condition on
              pods/kube-sriov-cni-ds-amd64-k5rk7
       $ date
       Wed May 18 12:27:35 UTC 2022
    
       $ kubectl get pods -A
       NAMESPACE    NAME                           READY   STATUS
       ...
       kube-system  kube-sriov-cni-ds-amd64-w2qlp  1/1     Running
       $ date
       Wed May 18 12:25:40 UTC 2022   <---- running before timeout
    
    The issue is resolved by moving a wait task further down which
    ensures the k8s pods have adequate time to be ready for the
    verification task in all 3 cases - fresh install, upgrade and B&R.
    
    Test Plan:
    
    PASS: AIO-SX upgrade
    PASS: Subcloud upgrade
    PASS: AIO-SX backup and restore
    PASS: AIO-SX system bring up (fresh install)
    
    Closes-Bug: 1974051
    Signed-off-by: Virginia Martins Perozim <vmartins@windriver.com>
    Change-Id: I3b80e2ad67221900b1103b7e742d9a5a0586ae2f

Changed in starlingx:
status:	In Progress → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.