Comment 19 for bug 2067889

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

With what I know from comment #14 and comment #5 I tried to get deeper to finally find what actually goes wrong which still eludes me :-/

Env vars that we need:
OVS_RUNDIR=/var/run/openvswitch/
abs_top_srcdir=../_debian/openvswitch/

1. db creation + starting ovsdb with --detach in background + checking if it has started
=> logs in /var/log/openvswitch/ovsdb-server.log

  ../../tests/system-dpdk.at:741: touch .conf.db.~lock~
  ../../tests/system-dpdk.at:741: ovsdb-tool create conf.db $abs_top_srcdir/vswitchd/vswitch.ovsschema
  ../../tests/system-dpdk.at:741: ovsdb-server --detach --no-chdir --pidfile --log-file --remote=punix:$OVS_RUNDIR/db.sock

2. initialize and start vswitch in background
=> logs in /var/log/openvswitch/ovs-vswitchd.log

../../tests/system-dpdk.at:741: ovs-vsctl --no-wait init
../../tests/system-dpdk.at:741: ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
../../tests/system-dpdk.at:741: ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-extra="--log-level=pmd.*:error --no-pci"
../../tests/system-dpdk.at:741: ovs-vswitchd --detach --no-chdir --pidfile --log-file -vvconn -vofproto_dpif -vunixctl

3. add bridge and port
../../tests/system-dpdk.at:744: ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev
../../tests/system-dpdk.at:745: ovs-vsctl add-port br10 dpdkvhostuserclient0 -- \
              set Interface dpdkvhostuserclient0 type=dpdkvhostuserclient options:vhost-server-path=$OVS_RUNDIR/dpdkvhostclient0

4. check for log to be as expected
system-dpdk.at:745: waiting until grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) vhost-user client: socket created" ovs-vswitchd.log...
2024-06-05T08:16:32.191Z|00064|dpdk|INFO|VHOST_CONFIG: (/cPJ/_debian/tests/system-dpdk-testsuite.dir/020/dpdkvhostclient0) vhost-user client: socket created, fd: 52
system-dpdk.at:745: wait succeeded immediately
system-dpdk.at:745: waiting until grep "vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket" ovs-vswitchd.log...
2024-06-05T08:16:32.191Z|00065|netdev_dpdk|INFO|vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket '/cPJ/_debian/tests/system-dpdk-testsuite.dir/020/dpdkvhostclient0'
system-dpdk.at:745: wait succeeded immediately
system-dpdk.at:745: waiting until grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) reconnecting..." ovs-vswitchd.log...
2024-06-05T08:16:32.191Z|00067|dpdk|INFO|VHOST_CONFIG: (/cPJ/_debian/tests/system-dpdk-testsuite.dir/020/dpdkvhostclient0) reconnecting...
system-dpdk.at:745: wait succeeded immediately

=> so far, all good in both cases

6. start testpmd in other console
dpdk-testpmd --in-memory --socket-mem=512 --single-file-segments --no-pci --vdev="net_virtio_user,path=$OVS_RUNDIR/dpdkvhostclient0,server=1"

Only diff so far, the known:
  dpdk|ERR|TELEMETRY: Socket write base info to client failed

5. set valid MTU

../../tests/system-dpdk.at:746: ovs-vsctl set Interface dpdkvhostuserclient0 mtu_request=9702
../../tests/system-dpdk.at:747: ovs-vsctl show

6. set invalid MTU

../../tests/system-dpdk.at:756: ovs-vsctl get Interface dpdkvhostuserclient0 mtu
../../tests/system-dpdk.at:761: ovs-vsctl set Interface dpdkvhostuserclient0 mtu_request=9711
../../tests/system-dpdk.at:762: grep "dpdkvhostuserclient0: unsupported MTU 9711" ovs-vswitchd.log
stdout:
2024-06-05T08:16:36.857Z|00117|netdev_dpdk|WARN|dpdkvhostuserclient0: unsupported MTU 9711

=> Still both systems behave the same

7. delete port and stop testpmd
../../tests/system-dpdk.at:765: ovs-vsctl del-port br10 dpdkvhostuserclient0
../../tests/system-dpdk.at:766: kill `cat testpmd.pid`
The latter in my case is stopping it in the other console.

It seems then the macro/function OVS_DPDK_STOP_VSWITCHD is crucial.
It is called like this in ./tests/system-dpdk.at:

OVS_DPDK_STOP_VSWITCHD(["dnl
/VHOST_CONFIG: (.*dpdkvhostclient0) failed to connect: No such file or directory/d
/dpdkvhostuserclient0: unsupported MTU 9711/d
/failed to set MTU for network device dpdkvhostuserclient0: Invalid argument/d"])

It seems it lists known messages it expectes in the vswitch log.

The effective clal I see in the log is even a bit longer, maybe some other known messages are added.

../../tests/system-dpdk.at:767: check_logs "/VHOST_CONFIG: (.*dpdkvhostclient0) failed to connect: No such file or directory/d
/dpdkvhostuserclient0: unsupported MTU 9711/d
/failed to set MTU for network device dpdkvhostuserclient0: Invalid argument/d"";/does not exist. The Open vSwitch kernel module is probably not loaded./d
/does not support MTU configuration,/d
/EAL: No \(available\|free\) .*hugepages reported/d
/Failed to enable flow control/d
/ice_vsi_config_outer_vlan_stripping(): Single VLAN mode (SVM) does not support qinq/d
/Rx checksum offload is not supported on/d
/TELEMETRY: No legacy callbacks, legacy socket not created/d"

defined as expected with extra DPDK messages in tests/system-dpdk-macros.at:

m4_define([OVS_DPDK_STOP_VSWITCHD],
  [OVS_VSWITCHD_STOP([dnl
$1";/does not exist. The Open vSwitch kernel module is probably not loaded./d
/does not support MTU configuration,/d
/EAL: No \(available\|free\) .*hugepages reported/d
/Failed to enable flow control/d
/ice_vsi_config_outer_vlan_stripping(): Single VLAN mode (SVM) does not support qinq/d
/Rx checksum offload is not supported on/d
/TELEMETRY: No legacy callbacks, legacy socket not created/d"])
])

And this in turn is in tests/ofproto-macros.at:

# OVS_VSWITCHD_STOP([ALLOWLIST])
#
# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log files
# for messages with severity WARN or higher and signaling an error if any
# is present. The optional ALLOWLIST may contain shell-quoted "sed"
# commands to delete any warnings that are actually expected, e.g.:
#
# OVS_VSWITCHD_STOP(["/expected error/d"])
m4_define([OVS_VSWITCHD_STOP],
  [AT_CHECK([check_logs $1])
   OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
   OVS_APP_EXIT_AND_WAIT([ovsdb-server])])

So it stopps the two we started in background.
The interesting bit is in "check_logs" which gets all the messages added we've seen.

So as I assumed above it might be as easy as this new error message about telemetry to break this.