libvirt cannot autostart sriov pool

Bug #2031078 reported by Enoch Leung
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Confirmed
Undecided
Sergio Durigan Junior

Bug Description

I have a home-lab setup and cannot have a SR-IOV (Intel X540) pool autostarted. After boot I can manually start the pools via virt-manager. The state of machine right after reboot as below (replaced MAC addresses to 00:00:00:00:00:00):

ip link
----------------------------------------------
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: wan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
3: rtl8156b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master wan state UP mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
4: x540_p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
    vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off
    vf 1 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off
    vf 2 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off
    vf 3 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off
5: x540_p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
    vf 0 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off
    vf 1 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off
    vf 2 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off
    vf 3 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off

virsh net-list --all
------------------------------
 Name State Autostart Persistent
----------------------------------------------
 WAN active yes yes
 x540_p0 inactive yes yes
 x540_p1 inactive yes yes

virsh net-dumpxml WAN
----------------------------------------------
<network>
  <name>WAN</name>
  <forward mode='bridge'/>
  <bridge name='wan'/>
</network>

virsh net-dumpxml x540_p0
----------------------------------------------
<network>
  <name>x540_p0</name>
  <forward mode='hostdev' managed='yes'>
    <pf dev='x540_p0'/>
  </forward>
</network>

Tags: server-todo
Enoch Leung (leun0036)
affects: netplan → libvirt
description: updated
Enoch Leung (leun0036)
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
most likely something on your device isn't ready yet to be started.
As an initial suggestion I'd recommend to enable debugging of libvirtd [1] via the config files.
And then do a full reboot.

The log file should then contain some info about trying to auto-start this but failing.

You can then grab the same log from the when you later manually start the pool and compare it.

That should already give you a good idea what might be wrong.

The two questions you need to answer with those logs as a first step are:
1. Is it even trying to enable it on boot time (is the actication or the trigger to activate broken)?
2. If it is trying to start it at boot, what is different to the same that works later.

[1]: https://libvirt.org/kbase/debuglogs.html#turning-on-debug-logs

Changed in libvirt (Ubuntu):
status: New → Incomplete
affects: libvirt → qemu (Ubuntu)
no longer affects: qemu (Ubuntu)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Team - Once this has more data at "what might be broken" and solid steps to reproduce, "horsea" has two "10-Gigabit X540-AT2" which could be used to have a look at that - but you have to take it out of the hands of DPDK (probably just start with a machine redeploy).

Usually hostdev and other pools are often having issues with apparmor until the admin configures it to allow what they want to allow (see bug 1677398). But since this is reported to "work later" that shouldn't be a problem here.
@Enoch
Still, to be on the save side - a check for any apparmor related denials would be helpful as well.

Revision history for this message
Enoch Leung (leun0036) wrote (last edit ):
Download full text (4.6 KiB)

should be related to another bug I filed to netplan
https://bugs.launchpad.net/netplan/+bug/1999181

here's journalctl output, with "default logging" enabled in libvirtd.conf
I keep the sequence of log entries as output from journalctl, but deleted not that relevant entries; let me know if I should expect / keep something
=========================================================
Aug 16 18:21:28 ****** audit[399]: AVC apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirtd//qemu_bridge_helper" pid=399 comm="apparmor_parser"
Aug 16 18:21:29 ****** systemd-networkd[319]: eth1: Interface name change detected, renamed to i350_p1.
Aug 16 18:21:29 ****** kernel: igb 0000:09:00.0 i350_p0: renamed from eth0
Aug 16 18:21:29 ****** kernel: ixgbe 0000:01:00.0: Intel(R) 10 Gigabit Network Connection
Aug 16 18:21:29 ****** systemd[1]: Started Virtualization daemon.
Aug 16 18:21:29 ****** systemd-networkd[319]: eth0: Interface name change detected, renamed to i350_p0.
Aug 16 18:21:29 ****** systemd[1]: Starting Suspend/Resume Running libvirt Guests...
Aug 16 18:21:29 ****** kernel: ixgbe 0000:01:00.1: enabling device (0000 -> 0002)
Aug 16 18:21:29 ****** systemd[1]: Started Dispatcher daemon for systemd-networkd.
Aug 16 18:21:29 ****** systemd[1]: Finished Suspend/Resume Running libvirt Guests.
Aug 16 18:21:29 ****** systemd[1]: Reached target Multi-User System.
Aug 16 18:21:29 ****** systemd[1]: Startup finished in 26.344s (firmware) + 6.367s (loader) + 2.337s (kernel) + 1.888s (userspace) = 36.938s.
Aug 16 18:21:29 ****** libvirtd[437]: 468: info : libvirt version: 8.0.0, package: 1ubuntu7.6 (Rafael Lopez <email address hidden> Tue, 20 Jun 2023 11:54:15 +1000)
Aug 16 18:21:29 ****** libvirtd[437]: 468: info : hostname: ******
Aug 16 18:21:29 ****** libvirtd[437]: 468: error : networkCreateInterfacePool:2250 : internal error: No usable Vf's present on SRIOV PF x540_p0
Aug 16 18:21:29 ****** libvirtd[437]: 468: error : networkCreateInterfacePool:2250 : internal error: No usable Vf's present on SRIOV PF i350_p0
Aug 16 18:21:29 ****** libvirtd[437]: 468: error : networkCreateInterfacePool:2250 : internal error: No usable Vf's present on SRIOV PF x540_p1
Aug 16 18:21:29 ****** libvirtd[437]: 468: error : networkCreateInterfacePool:2250 : internal error: No usable Vf's present on SRIOV PF i350_p1
Aug 16 18:21:29 ****** kernel: ixgbe 0000:01:00.1: Multiqueue Enabled: Rx Queue count = 4, Tx Queue count = 4 XDP Queue count = 0
Aug 16 18:21:29 ****** kernel: ixgbe 0000:01:00.1: 16.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x4 link at 0000:00:1c.0 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x>
Aug 16 18:21:29 ****** networkd-dispatcher[411]: WARNING:Unknown index 6 seen, reloading interface list
Aug 16 18:21:29 ****** kernel: ixgbe 0000:01:00.1: MAC: 3, PHY: 0, PBA No: 000000-000
Aug 16 18:21:29 ****** kernel: igb 0000:09:00.1: 7 VFs allocated
Aug 16 18:21:29 ****** networkd-dispatcher[411]: WARNING:Unknown index 7 seen, reloading interface list
Aug 16 18:21:29 ****** kernel: ixgbe 0000:01:00.1: Intel(R) 10 Gigabit Network Connection
Aug 16 18:21:29 ****** kernel: igb 0000:09:00.0: 7 VFs allocated
Aug 16 18:21:29 ****** ...

Read more...

Revision history for this message
Enoch Leung (leun0036) wrote :

I found that besides SRIOV pool, a storage pool cannot be auto-mounted as well, likely failing with the same reason = libvirt running too fast / early? it is a [dir pool] pointing to a dir on a mountpoint (4TB) with a 5400rpm HDD; root and boot reside on SATA SSD

extract from journalctl -b, timezone is GMT+8
-----------------------------------------------
Aug 30 02:15:56 ****** systemd[1]: Found device ST4000LM016-1N2170 NTFS.
Aug 30 02:15:56 ****** systemd[1]: Mounting /****/4TB...
Aug 30 02:15:57 ****** libvirtd[450]: internal error: No usable Vf's present on SRIOV PF x540_p0
Aug 30 02:15:57 ****** libvirtd[450]: internal error: No usable Vf's present on SRIOV PF i350_p1
Aug 30 02:15:57 ****** libvirtd[450]: internal error: No usable Vf's present on SRIOV PF x540_p1
Aug 30 02:15:57 ****** libvirtd[450]: internal error: No usable Vf's present on SRIOV PF i350_p0
Aug 30 02:15:57 ****** systemd[1]: Starting Record Runlevel Change in UTMP...
Aug 30 02:15:57 ****** libvirtd[450]: cannot open directory '/****/4TB/CDIMAGES': No such file or directory
Aug 30 02:15:57 ****** libvirtd[450]: internal error: Failed to autostart storage pool 'CDIMAGES': cannot open directory '/****/4TB/CDIMAGES': No such file or directory
Aug 30 02:15:58 ****** systemd[1]: Mounted /****/4TB.
Aug 30 02:15:58 ****** systemd[1]: Startup finished in 37.044s (firmware) + 6.098s (loader) + 2.776s (kernel) + 2.442s (userspace) = 48.361s.
Aug 30 02:15:58 ****** networkd-dispatcher[418]: WARNING:Unknown index 22 seen, reloading interface list

tags: added: server-triage-discuss
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Sergio will be back in a bit and then try to sort out with you via discussion if this needs a local config change for your case or a fix in the package.

tags: removed: server-triage-discuss
Changed in libvirt (Ubuntu):
assignee: nobody → Sergio Durigan Junior (sergiodj)
tags: added: server-todo
Changed in libvirt (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Enoch Leung (leun0036) wrote :

currently I fixed this by creating a systemd service, which I think I will need even after this is fixed, because I do not know how to turn off spoof checking within netplan for my dpdk+vpp VM, but I hope it may be helpful? but at least when this is fixed, I don't need this systemd service for my other machines which does not have a software router VM on it.

my vpp_vm.service
--------------------------------------------------------------
[Unit]
Description=Start script to prepare NIC and libvirt network
Wants=network-online.target
After=network-online.target

[Service]
Type=oneshot
#ExecCondition=
#ExecStartPre=
ExecStart=/root/vpp_vm.sh
#ExecStartPost=
WorkingDirectory=/root

[Install]
WantedBy=multi-user.target
--------------------------------------------------------------

and inside my vpp_vm.sh I look for inactive network and pool, then start them
--------------------------------------------------------------
for _net in `virsh net-list --inactive --name`; do
 virsh net-start $_net
done
--------------------------------------------------------------
I didn't look for autostart flag as I don't need that here

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.