Comment 2 for bug 1999074

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/866878
Committed: https://opendev.org/starlingx/integ/commit/e3705e6046fed3f32041f3751f81fe27e3680bec
Submitter: "Zuul (22348)"
Branch: master

commit e3705e6046fed3f32041f3751f81fe27e3680bec
Author: Andre Fernando Zanella Kantek <email address hidden>
Date: Wed Dec 7 06:55:44 2022 -0500

    Execute one extra attempt to restore SRIOV device plugin

    The service k8s-pod-recovery failed to restore the SRIOV device
    plugin, necessary for pods that use SRIOV interfaces to create the
    resource, those pods need to add the label 'restart-on-reboot=true'
    to be restarted during boot. The failure was observed during an
    upgrade, and although rare, it left the operator to actuate by
    manually restarting the pods later.

    This change adds a wait for the pod stabilization (it is considered
    stable when stops the state transitions) and, if still in failure,
    execute 2 attempts to restore the plugin. Logs were added to better
    register the pod state in case of an error.

    Test Plan:
    [PASS] execute 7 upgrades in an AIO-SX lab

    Closes-Bug: 1999074

    Signed-off-by: Andre Fernando Zanella Kantek <email address hidden>
    Change-Id: I838c35d3e0a3557c71344945a8e00f22ccb50eb4