Comment 5 for bug 1945868

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Markus, thanks for the report.
While the good/bad switch with the kernels suggest a reason in there, knowing more about the exact configuration of the guest would help in any case. The reason easily is that myself and million others use virtio-net in focal guests just fine, so there must be some detail to it in your case that makes it differ. Knowing that will help to understand.

So let me ask a few clarifications:
- You said 20.04 Host, so I assume you are on qemu 1:4.2-3ubuntu6.17 and libvirt 6.0.0-0ubuntu8.14 (could as well be backports and I want to be sure)
- You mentioned guest versions, but which exact host kernel version are you using when this happens?
- How did you configure your guest and especially the network adapter (libvirt XML if you use it or qemu cmdline if you use some other way to create it)
- Does this only apply to an old guest that is kept up (upgrade to 20.04 in the guest, reboot guest, but the qemu process stays since a long time?) or is it reproducible with a freshly started 20.04 guest on the same system?

Furthermore let us know if there is more when the issue happens in either of:
a) host kernel (dmesg)
b) host userspace (qemu log in /var/log/libvirt/guestname)
c) guest journal
d) host journal

From the code that complains in the guest we can check which feature it could not set.
The workaround you mentioned was about checksumming, but maybe in your case it is something different.
"wanted 0x0000008000174a29, left 0x000000800017ca29"

So it wanted to disable one, but could not.
This is already interesting as plenty of features are fixed when using virtio-net.
You can see that if you look at `$ sudo ethtool --show-features enp1s0` for example.

The list your guest request is:
tx-scatter-gather
tx-checksum-ip-generic
highdma
rx-vlan-filter
tx-generic-segmentation
rx-gro
tx-tcp-segmentation
tx-gso-robust
tx-tcp-ecn-segmentation
tx-tcp6-segmentation
(This is somewhat hard to read, so I hope this is right)

But more important is the difference which it wanted to disable but could not.
That is 0x8000 which maps via netdev_features_t to "rx-lro"

Checking an example guest that I have I see this is default on but should be switchable.
This is from the newest Ubuntu release:
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: on
$ sudo ethtool --features enp1s0 lro off
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: off
$ sudo ethtool --features enp1s0 lro on
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: on

It might be worth with your working kernel to run the above sequence to see if there it is
a) off or on by default on start
b) can be turned off/on as requested

While you have lost networking after virtio-net fails, it would be even more awesome if you could run the same sequence via e.g. "virsh console" or any other non-network access to the guest that you might have.

I was running this sequence then as cross check with Focal host and Focal guest
1. 5.4.0-86-generic
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: off
$ sudo ethtool --features enp1s0 lro on
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: on
$ sudo ethtool --features enp1s0 lro off
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: off

2. 5.4.0-88-generic
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: off
$ sudo ethtool --features enp1s0 lro on
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: on
$ sudo ethtool --features enp1s0 lro off
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: off

3. 5.4.0-89-generic
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: off [fixed]
$ sudo ethtool --features enp1s0 lro on
Cannot change large-receive-offload
Could not change any device features
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: off [fixed]
$ sudo ethtool --features enp1s0 lro off
Cannot change large-receive-offload
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: off [fixed

I have checked, but LRO is nothing that libvirt/qemu can directly manage [1].
Otherwise I'd have recommended you some of the offloads to tweak there, but as LRO isn't one of them I can't.

Summary:
- we might need more info to recreate your exact issue
- I do not see an issue with 5.4.0-88 as you do.
- But I see that something in that exact area changed in 5.4.0-89 in focal-proposed.
- I do not get the guest crash that you do with either one.
- Even if the above finding isn't your exact issue it might indicate an issue in 5.4.0-89 proposed, I'll split that into a new bug 1946185 to be sure it is seen individually
- I can't see anything I could do for qemu here atm (=> incomplete), waiting for the kernel team to have a look

[1]: https://libvirt.org/formatdomain.html#setting-nic-driver-specific-options