stx-openstack: neutron-ovs-agent-controller failling on Debian

Bug #1999426 reported by Thales Elero Cervi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Thales Elero Cervi

Bug Description

Brief Description
-----------------
Recent installation tests of stx-openstack on Debian failed due to the neutron-ovs-agent-controller pod. It fails to initialize (container neutron-ovs-agent-init) and is broken on Init:CrashLoopBackOff state, breaking then the application apply process.

This behavior was not previously reproduced on virtual deployments.

Severity
--------
Critical: stx-openstack can not be applied.

Steps to Reproduce
------------------
* Upload stx-openstack (Debian stx)
* Apply stx-openstack

Expected Behavior
------------------
stx-openstack should apply successfully

Actual Behavior
----------------
stx-openstack apply fails

Reproducibility
---------------
Reproducible

System Configuration
--------------------
AIO-SX

Branch/Pull Time/Commit
-----------------------
master:
* starlingx/master/debian/monolithic/20221206T070000Z

Last Pass
---------
N/A

Timestamp/Logs
--------------
$ kubectl -n openstack describe pod neutron-ovs-agent-controller-0-937646f6-sp9h9 | tail -n 5
  Normal Started 37m kubelet Started container neutron-openvswitch-agent-kernel-modules
  Normal Created 37m (x4 over 37m) kubelet Created container neutron-ovs-agent-init
  Normal Started 37m (x4 over 37m) kubelet Started container neutron-ovs-agent-init
  Normal Pulled 34m (x6 over 37m) kubelet Container image "registry.local:9001/docker.io/starlingx/stx-neutron:master-centos-stable-latest" already present on machine
  Warning BackOff 4m50s (x152 over 37m) kubelet Back-off restarting failed container

$ kubectl -n openstack get pods | grep neutron-ovs-agent-controller
neutron-ovs-agent-controller-0-937646f6-sp9h9 0/1 Init:CrashLoopBackOff 12 (92s ago) 40m
neutron-ovs-agent-controller-1-cab72f56-kn822 0/1 Init:CrashLoopBackOff 12 (77s ago) 40m

$ kubectl -n openstack logs -f pod/neutron-ovs-agent-controller-0-937646f6-sp9h9 -c neutron-ovs-agent-init
+ OVS_SOCKET=/run/openvswitch/db.sock
+ chown neutron: /run/openvswitch/db.sock
++ cat /run/openvswitch/ovs-vswitchd.pid
+ OVS_PID=35
+ OVS_CTL=/run/openvswitch/ovs-vswitchd.35.ctl
+ chown neutron: /run/openvswitch/ovs-vswitchd.35.ctl
+ DPDK_CONFIG_FILE=/tmp/dpdk.conf
+ DPDK_CONFIG=
+ DPDK_ENABLED=false
+ '[' -f /tmp/dpdk.conf ']'
++ sed 's/[{}"]//g' /tmp/auto_bridge_add
++ tr , '\n'
+ for bmap in '`sed '\''s/[{}"]//g'\'' /tmp/auto_bridge_add | tr "," "\n"`'
+ bridge=br-phy0
+ iface=ens785f0
+ ovs-vsctl --no-wait --may-exist add-br br-phy0
+ '[' -n ens785f0 ']'
+ '[' ens785f0 '!=' null ']'
+ ovs-vsctl --no-wait --may-exist add-port br-phy0 ens785f0
++ get_dpdk_config_value .enabled
++ values=.enabled
++ filter=
+++ echo .enabled
+++ jq -r
/tmp/neutron-openvswitch-agent-init.sh: line 18: jq: command not found
++ value=
++ [[ '' == \n\u\l\l ]]
++ echo ''
+ [[ '' != \t\r\u\e ]]
+ ip link set dev ens785f0 up
+ tunnel_types=vxlan
+ [[ -n vxlan ]]
+ tunnel_interface=docker0
+ '[' -z docker0 ']'
+ [[ false == \t\r\u\e ]]
+ [[ -n vxlan ]]
++ get_ip_address_from_interface docker0
++ local interface=docker0
+++ ip -4 -o addr s docker0
+++ awk '{ print $4; exit }'
+++ awk -F / '{print $1}'
Device "docker0" does not exist.
++ local ip=
++ '[' -z '' ']'
++ exit 1
+ LOCAL_IP=

Test Activity
-------------
Developer Testing

Workaround
----------
N/A

Changed in starlingx:
assignee: nobody → Thales Elero Cervi (tcervi)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-armada-app (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-armada-app (master)

Reviewed: https://review.opendev.org/c/starlingx/openstack-armada-app/+/867575
Committed: https://opendev.org/starlingx/openstack-armada-app/commit/38e6cf641fc2646d4e13af4031a3bfb915ac261e
Submitter: "Zuul (22348)"
Branch: master

commit 38e6cf641fc2646d4e13af4031a3bfb915ac261e
Author: Thales Elero Cervi <email address hidden>
Date: Tue Dec 13 18:35:11 2022 -0300

    Remove explicit set of docker0 as tunnel interface

    Last month [1] was merged removing the default docker network bridge to
    avoid IP conflicts with addresses already in use. Since StarlingX no
    longer runs containers with docker it is a reasonable change.

    Later it was noticed that neutron-openvswitch-agent-init.sh access a
    network.interface.tunnel that is explicitly set to docker0 since the
    original Armada manifests [2] of stx-openstack application. This was
    inherited from Armada to FluxCD during the application migration.

    Since this default network bridge is no longer available, this static
    override needs to be updated. We are relying now on the fallback
    mechanism to search for this interface, as peer [3] and [4].
    In case there is any future network use case that requires a different
    approach, it could also be possible to use the application plugins to
    retrieve the cluster-host interface and dynamically override this field.

    [1] https://review.opendev.org/c/starlingx/config-files/+/865329
    [2] https://opendev.org/starlingx/openstack-armada-app/commit/b7d0b3ed0c9e6ceac86d63088ac783dd0adecf7b
    [3] https://github.com/openstack/openstack-helm/blob/master/neutron/values.yaml#L110
    [4] https://github.com/openstack/openstack-helm/blob/master/neutron/templates/bin/_neutron-openvswitch-agent-init.sh.tpl#L423

    TEST PLAN:
    PASS - Build stx-openstack fluxcd helm charts
    PASS - Upload and Apply stx-openstack
    PASS - Check that neutron-ovs-agent-controller pod spawns successfully

    Closes-Bug: 1999426

    Signed-off-by: Thales Elero Cervi <email address hidden>
    Change-Id: I169d8408420483bbf4e6c59c2cf70be5da039481

Changed in starlingx:
status: In Progress → Fix Released
tags: added: stx.8.0 stx.distro.openstack
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.