neutron-ovs-agent-controller-1 issue: MountVolume.SetUp failed for volume "neutron-etc" : failed to sync secret cache: timed out waiting for the condition

Bug #1958073 reported by Alexandru Dimofte
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Thiago Paiva Brito

Bug Description

Brief Description
-----------------
There are 2 kubernetes pods failing neutron-ovs-agent-controller-0 and neutron-ovs-agent-controller-1
issue: MountVolume.SetUp failed for volume "neutron-etc" : failed to sync secret cache: timed out waiting for the condition
I observed this issue on 2 bare-metal configurations (Duplex and Standard).

Severity
--------
<Major: System/Feature is usable but degraded>

Steps to Reproduce
------------------
Try to install latest image from master branch(20220116T031725Z), then lock/unlock the controllers and computes.

Expected Behavior
------------------
The neutron-ovs-agent-controller-0-xxx and neutron-ovs-agent-controller-1-xxx pods status should be "running".

Actual Behavior
----------------
The above pods are failing/crashing.

Events: │
│ Type Reason Age From Message │
│ ---- ------ ---- ---- ------- │
│ Normal Scheduled 3h11m default-scheduler Successfully assigned openstack/neutron-ovs-agent-controller-0-937646f6-prprb to controller-0 │
│ Normal Pulled 3h11m kubelet, controller-0 Container image "registry.local:9001/quay.io/airshipit/kubernetes-entrypoint:v1.0.0" already present on machine │
│ Normal Created 3h11m kubelet, controller-0 Created container init │
│ Normal Started 3h11m kubelet, controller-0 Started container init │
│ Normal Pulled 3h10m kubelet, controller-0 Container image "registry.local:9001/docker.io/starlingx/stx-neutron:master-centos-stable-20220113T034723Z.0" al │
│ ready present on machine

Warning FailedMount 37m (x2 over 37m) kubelet MountVolume.SetUp failed for volume "neutron-etc" : failed to sync secret cache: timed out waiting for the condition

Warning Failed 36m kubelet Error: failed to prepare subPath for volumeMount "neutron-bin" of container "neutron-ovs-agent"

...
        dump_pods_info(con_ssh=con_ssh)
> raise exceptions.KubeError(msg)
E utils.exceptions.KubeError: Kubernetes error.
E Details: Some pods are not Running or Completed: {'pci-irq-affinity-agent-phf44': 'Init:0/1'}

Reproducibility
---------------
I don't know yet if this is 100%, I guess not.

System Configuration
--------------------
Two node system, Multi-node system

Branch/Pull Time/Commit
-----------------------
master 20220116T031725Z

Last Pass
---------
20220113T023728Z

Timestamp/Logs
--------------
Will be attached

Test Activity
-------------
Sanity

Workaround
----------
-

Revision history for this message
Alexandru Dimofte (adimofte) wrote :
Ghada Khalil (gkhalil)
tags: added: stx.distro.openstack
Revision history for this message
Ghada Khalil (gkhalil) wrote (last edit ):

screening: As per Thiago Brito, this issue is related to recent code changes submitted for https://storyboard.openstack.org/#!/story/2009702
The changes appear to be in the stx master branch only, so should not affect r/stx.6.0

tags: added: stx.7.0
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Thiago Paiva Brito (outbrito)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/825398

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/825398
Committed: https://opendev.org/starlingx/stx-puppet/commit/e00edd94f32f09f07e45b0ee3752d097d3a8f844
Submitter: "Zuul (22348)"
Branch: master

commit e00edd94f32f09f07e45b0ee3752d097d3a8f844
Author: Thiago Brito <email address hidden>
Date: Wed Jan 19 17:26:16 2022 -0300

    Fix resource lookup for ovs

    On change ae635b5b80fcb61c429a6fc17961a9f3bf614964, the vswitch_class
    was changed to ovs_dpdk, but the resources created by sysinv at [1]
    are at the platform::vswitch::ovs:: lookup. This mismatch is failing
    lookup and the bridges for the underlying datanetworks aren't being
    created when puppet runs. As a result, the neutron-ovs-agent pods are
    failing with CrashLoopBackoff. This commit fixes it by reverting the
    resources to the correct lookup path on hiera.

    [1] https://opendev.org/starlingx/config/src/commit/ece13f740847f3bcc7470cc7ec8c1896dd61f014/sysinv/sysinv/sysinv/sysinv/puppet/ovs.py#L108

    TEST PLAN
    PASS ovs-dpdk: Clean install of the Starlingx ISO verified that the
         br-phy* bridges were created for the underlying datanetworks
         using ovs-vsctl on the host
    PASS ovs-dpdk: Installation of stx-openstack is successful
    PASS ovs-dpdk: Created project networks and instances
    PASS ovs: Clean install of the Starlingx ISO
    PASS ovs: Installation of stx-openstack is successful and the
         openvswitchd and ovs-agent pods are running OK
    PASS ovs: Created project networks and instances

    Closes-Bug: #1958073
    Signed-off-by: Thiago Brito <email address hidden>
    Change-Id: I53e16df5403fa7c7f82b8e67e3e5d18a2103d599

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

The stx master sanity is now green after this commit went in. Sanity report: http://lists.starlingx.io/pipermail/starlingx-discuss/2022-January/012698.html

Revision history for this message
Alexandru Dimofte (adimofte) wrote :

I think this issue was fixed since last 2 master builds are green. We can close this issue. Thanks!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.