Netplan is not setting up SRIOV Virtual Functions on Jammy Charmed OpenStack during boot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
netplan.io (Ubuntu) |
Triaged
|
Low
|
Unassigned |
Bug Description
Trying to deploy Charmed OpenStack (Yoga) Jammy series with OVN Hardware Offload.
# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_
DISTRIB_
DISTRIB_
# uname -a
Linux node3 5.15.0-35-generic #36-Ubuntu SMP Sat May 21 02:24:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/openstack-
OPENSTACK_
As part of the charms bundle the following config is used:
ovn-chassis:
charm: ch:ovn-chassis
# Please update the `bridge-
# hardware used in your deployment. See the referenced documentation at the
# top of this file.
options:
ovn-
bridge-
enable-
sriov-numvfs: "ens1f1:8"
channel: 22.03/stable
bindings:
"": *internal-space
data: *overlay-space
This is translated to the following netplan file on the deployed node:
cat /etc/netplan/
#######
# [ WARNING ]
# Configuration file maintained by Juju. Local changes may be overwritten.
# Config managed by ovn-chassis charm
#######
network:
version: 2
ethernets:
ens1f1:
virtual-
embedded-
delay-
However after reboot of the deployed servers, the SRIOV VFs are not enabled on the NVIDIA NIC:
# lspci | grep -i nox
08:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
08:00.1 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
When manually running the netplan, VFs are configured (and switch mode change is failing as the NIC is already bounded - I believe this is expected):
#netplan --debug apply
.
.
.
ens1f1:
delay-
embedded-
match:
macaddress: 04:3f:72:9e:0b:a1
mtu: 1500
set-name: ens1f1
virtual-
.
.
.
DEBUG:Found VFs of 0000:08:00.1: ['0000:08:02.3', '0000:08:02.4', '0000:08:02.5', '0000:08:02.6', '0000:08:02.7', '0000:08:03.0', '0000:08:03.1', '0000:08:03.2']
Error: mlx5_core: Can't change mode, E-Switch is busy.
kernel answers: Device or resource busy
Traceback (most recent call last):
File "/usr/sbin/
netplan.main()
File "/usr/share/
self.
File "/usr/share/
self.func()
File "/usr/share/
self.
File "/usr/share/
self.func()
File "/usr/share/
NetplanAppl
File "/usr/share/
apply_
File "/usr/share/
pcidev.
File "/usr/share/
subprocess.
File "/usr/lib/
raise CalledProcessEr
subprocess.
root@node3:
# lspci | grep -i nox
08:00.0 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
08:00.1 Ethernet controller: Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6 Dx network controller
08:00.2 DMA controller: Mellanox Technologies MT42822 BlueField-2 SoC Management Interface
08:02.3 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.4 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.5 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.6 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:02.7 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:03.0 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:03.1 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
08:03.2 Ethernet controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
affects: | openvswitch (Ubuntu) → plan (Ubuntu) |
affects: | plan (Ubuntu) → netplan.io (Ubuntu) |
tags: | added: fr-2523 |
Important note: after moving to bond configuration I dont see this issue anymore. it seems like its happening only when using a single interface for high speed fabric.