STX-Openstack | fail to apply - stx-ovs in imagepullbackoff

Bug #2030749 reported by Gabriel Calixto de Paula
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Thales Elero Cervi

Bug Description

Brief Description
-----------------
Stx-openstack is failing to apply due to a pod that stays in imagepullbackoffstatus - stx-ovs

Severity
--------
Major

Steps to Reproduce
------------------
1-Apply Stx-Openstack

Expected Behavior
------------------
STX-O should apply normally

Actual Behavior
----------------
STX-O fails to apply

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Two-node system, DX

Branch/Pull Time/Commit
-----------------------
STX-O 2023-08-04

Last Pass
---------
STX-O 20230731T060000Z, Jul-31

Timestamp/Logs
--------------

[sysadmin@controller-0 ~(keystone_admin)]$ crictl image | grep docker.io/starlingx/stx-
registry.local:9001/docker.io/starlingx/stx-cinder master-debian-stable-latest 5053f4fb3cf95 395MB
registry.local:9001/docker.io/starlingx/stx-fm-rest-api master-debian-stable-latest 1ac9af8b3fe37 393MB
registry.local:9001/docker.io/starlingx/stx-glance master-debian-stable-latest f78dd5db683ad 371MB
registry.local:9001/docker.io/starlingx/stx-heat master-debian-stable-latest d7a226bc38866 346MB
registry.local:9001/docker.io/starlingx/stx-horizon master-debian-stable-latest 9065963260fd5 349MB
registry.local:9001/docker.io/starlingx/stx-keystone master-debian-stable-latest ee3222e77e418 321MB
registry.local:9001/docker.io/starlingx/stx-libvirt master-debian-stable-latest 19a9c82f10e6d 424MB
registry.local:9001/docker.io/starlingx/stx-neutron master-debian-stable-latest f2402793de9aa 350MB
registry.local:9001/docker.io/starlingx/stx-nova-api-proxy master-debian-stable-latest 2bb1195a9f8e7 306MB
registry.local:9001/docker.io/starlingx/stx-nova master-debian-stable-latest 6442589be942c 455MB
registry.local:9001/docker.io/starlingx/stx-openstackclients master-debian-stable-latest 5118e2b53a469 331MB
registry.local:9001/docker.io/starlingx/stx-ovs master-debian-stable-latest edb67b66d4151 293MB
registry.local:9001/docker.io/starlingx/stx-pci-irq-affinity-agent master-debian-stable-latest 008260c0fee77 334MB
registry.local:9001/docker.io/starlingx/stx-placement

Test Activity
-------------
Sanity

Workaround
----------
N/A

tags: added: stx.9.0 stx.distro.openstack
Changed in starlingx:
assignee: nobody → Thales Elero Cervi (tcervi)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-armada-app (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
Thales Elero Cervi (tcervi) wrote :

I was able to verify that the application-apply failed with aborted Helm Release applies, for releases that exhausted the retry options: nova, neutron and ovs.

First problematic pod was:
neutron-ovs-agent-controller-1-cab72f56-qv2gb 0/1 Init:CrashLoopBackOff 67 (4m46s ago) 5h24m

It failed to find an OVS socket to connect to:
$ kubectl -n openstack logs -f pod/neutron-ovs-agent-controller-1-cab72f56-qv2gb -c neutron-ovs-agent-init
+ OVS_SOCKET=/run/openvswitch/db.sock
+ chown neutron: /run/openvswitch/db.sock
chown: cannot access '/run/openvswitch/db.sock': No such file or directory

What made me realize that the OVS pod was in a failed state:
openvswitch-xbxrl 0/2 Init:ImagePullBackOff 0 5h30m

Due to an issue when pulling its image from the registry.local:
Warning Failed 5h29m (x4 over 5h30m) kubelet Failed to pull image "registry.local:9001/docker.io/starlingx/stx-ovs:master-debian-stable-latest": rpc error: code = Unknown desc = failed to pull and unpack image "registry.local:9001/docker.io/starlingx/stx-ovs:master-debian-stable-latest": failed to resolve reference "registry.local:9001/docker.io/starlingx/stx-ovs:master-debian-stable-latest": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
  Warning Failed 5h29m (x4 over 5h30m) kubelet Error: ErrImagePull
  Warning Failed 5h29m (x5 over 5h30m) kubelet Error: ImagePullBackOff
  Normal Pulling 115m (x47 over 5h30m) kubelet Pulling image "registry.local:9001/docker.io/starlingx/stx-ovs:master-debian-stable-latest"
  Normal BackOff 51s (x1455 over 5h30m) kubelet Back-off pulling image "registry.local:9001/docker.io/starlingx/stx-ovs:master-debian-stable-latest"

Revision history for this message
Thales Elero Cervi (tcervi) wrote :

I was able to see that this image was successfully downloaded from the public registry and uploaded to registry.local during the application-apply process:

sysinv 2023-08-08 05:50:58.897 88785 INFO sysinv.conductor.kube_app [-] Image registry.local:9001/docker.io/starlingx/stx-ovs:master-debian-stable-latest download succeeded in 15 seconds

And, as peer the logs on this launchpad description, the image was verified to be on registry.local indeed:
$ crictl image | grep stx-ovs
registry.local:9001/docker.io/starlingx/stx-ovs master-debian-stable-latest edb67b66d4151 293MB

So the issue here was actually the ServiceAccount used for registry.local authorization.
The upversion of openstack-helm-infra showed the need of a new patch specifically for the ServiceAccount definition on OVS chart, but the patch was still missing one line that is being added as this bug fix.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-armada-app (master)

Reviewed: https://review.opendev.org/c/starlingx/openstack-armada-app/+/890786
Committed: https://opendev.org/starlingx/openstack-armada-app/commit/58ec7994b10595c0521a94d85b8c56134bf29a40
Submitter: "Zuul (22348)"
Branch: master

commit 58ec7994b10595c0521a94d85b8c56134bf29a40
Author: Thales Elero Cervi <email address hidden>
Date: Tue Aug 8 09:27:10 2023 -0300

    Fix openvswitch DaemonSet ServiceAccount patch

    During the openstack-helm-infra upversion [1] it was noticed that the
    updated version of openvswitch chart (1.1.15) was missing the custom
    ServiceAccount definition for its DaemonSet template.
    This fix was proposed upstream [2] and currently implemented to
    stx-openstack via an OSH-I patch [3]. The patch though, was missing the
    serviceAccountNamedefinition in the daemonset template.

    This change fixes the stx-openstack patch, including the
    serviceAccountNamedefinition to openvswitch daemonset template.

    [1] https://review.opendev.org/c/starlingx/openstack-armada-app/+/887637
    [2] https://review.opendev.org/c/openstack/openstack-helm-infra/+/888504
    [3] https://review.opendev.org/c/starlingx/openstack-armada-app/+/887637/16/openstack-helm-infra/debian/deb_folder/patches/0016-Add-ServiceAccount-to-openvswitch-pod.patch

    TEST PLAN:
    PASS - build-pkgs -c -p openstack-helm-infra,openstack-helm
    PASS - build-pkgs -c -p stx-openstack-helm-fluxcd
    PASS - Upload stx-openstack application
    PASS - Apply stx-openstack application

    Closes-Bug: 2030749

    Signed-off-by: Thales Elero Cervi <email address hidden>
    Change-Id: Ia0c42466cada50cb3af9490f5ff1b36e839a5915

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.