TL;DR:
- a KVM guest with the kernel change as identified above
- works on Bionic host (kernel 4.15 / qemu 2.11 / libvirt 4.0)
- migrating on a Xenial host (kernel 4.4 / qemu 2.5 / libvirt 1.3.1) fails
VQ 0 size 0x100 Guest index 0x8101 inconsistent with Host index 0x81: delta 0x8080
error while loading state for instance 0x0 of device 'pci@800000020000000:01.0/virtio-net'
- not fixed in latest 4.19 kernel
- only failing on ppc64el (not x86) - maybe high/low word related
- qemu bisecting found a high/low word related virtio issue and fix in the 2.6 stable series that
Note: generated names are odd (hashes are ok), most 4.13 here are actually 4.14 in development.
GOOD v4.13 Mon Sep 10 10:03:38
BAD v4.14 Mon Sep 10 10:51:31
Step-1: 15d8ffc9 #1 Mon Sep 10 12:36:30 bad
Step-2: bafb0762 #2 Mon Sep 10 13:04:52 good
Step-3: b63f6044 #3 Mon Sep 10 13:24:27 bad
Step-4: e08af95d #4 Mon Sep 10 13:44:11 bad
Step-5: 2a493216 #5 Mon Sep 10 14:25:50 bad
Step-6: a248878d #6 Mon Sep 10 14:50:47 bad
Step-7: 160e22aa #7 Mon Sep 10 15:09:03 good
Step-8: 727f8914 #8 Mon Sep 10 18:30:06 good
Step-9: 4a3c67a6 #9 Mon Sep 10 20:37:37 bad
Step-10: 04584957 #10 Tue Sep 11 04:35:41 bad
Step-11: f7ce9103 #11 Tue Sep 11 05:30:50 bad
Step-12: 192f68cf #12 Tue Sep 11 05:49:50 good
Step-13: 3f93522f #13 Tue Sep 11 06:13:01 bad
Step-14: 4941d472 #14 Tue Sep 11 06:40:05 good
Offending change identified as:
commit 3f93522ffab2d46a36b57adf324a54e674fc9536
Author: Jason Wang <email address hidden>
Date: Wed Jul 19 16:54:49 2017 +0800
virtio-net: switch off offloads on demand if possible on XDP set
Current XDP implementation wants guest offloads feature to be disabled
on device. This is inconvenient and means guest can't benefit from
offloads if XDP is not used. This patch tries to address this
limitation by disabling the offloads on demand through control guest
offloads. Guest offloads will be disabled and enabled on demand on XDP
set.
Signed-off-by: Jason Wang <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
To check if any commit in the latest kernel fixed the issue:
4.19-rc3 as of today (11da3a7f): bad
=> Not fixed yet as a guest kernel commit.
=> Also I don't see any further how we could fix hat on the kernel side, despite the issue being introduced there
Since we had the report that a Bionic Host would be ok I bumped the test env up one by one.
(in order)
Libvirt 1.3.1 -> 4.0: still bad
kernel 4.4 -> 4.15: still bad
qemu 2.5 -> 2.11: working
So we are actually looking for a qemu fix for a kernel introduced issue it seems.
Via UCA we can access some rather easily.
qemu 2.5 (X) bad
qemu 2.6.1 (Y) good
qemu 2.8 (Z) good
qemu 2.10 (A) good
qemu 2.11: (B) good
So a qemu bisect for 2.5->2.6 it shall be :-/
Back then this was still based on full debian versions so no bisect directly in the packaging repo on these old versions.
Using checkinstall and the configure line of the qemu yakkety version (reset machine type to upstream type and linking spapr-rtas.bin [qemu-slof] and others to the expected place).
ln -s /usr/share/slof/* /usr/share/qemu/slof.bin
ln -s /usr/share/seabios/* /usr/share/qemu/
ln -s /usr/lib/ipxe/qemu/* /usr/share/qemu/
But I realized that 2.6.0 was affected as well.
Maybe the fix was part of 2.6.1?
I checked the last 2.6.0 publish we had back in Yakkety and it failed as well.
So after all the bisect might be much smaller between 2.6.0 and its upstream stable branch.
Verified start points with builds from git.
qemu-2.6.0-bisect-start: old behavior (bad)
qemu-2.6.2-bisect-start: new behavior (good)
Eventually came down to:
git bisect start
# new: [529d45e151d82a772cd9b9af64bb25f88fba6567] Update version for 2.6.2 release
git bisect new 529d45e151d82a772cd9b9af64bb25f88fba6567
# old: [bfc766d38e1fae5767d43845c15c79ac8fa6d6af] Update version for v2.6.0 release
git bisect old bfc766d38e1fae5767d43845c15c79ac8fa6d6af
# new: [ec211e742683d4bc187839b01a4b0056617681a1] atapi: fix halted DMA reset
git bisect new ec211e742683d4bc187839b01a4b0056617681a1
# old: [71798fda8b6ef8df47c7640ba0bc24d7060ad307] vmsvga: shadow fifo registers
git bisect old 71798fda8b6ef8df47c7640ba0bc24d7060ad307
# old: [909d87d347a7a5e08c32cbdb67bb2927fcefbf34] virtio: set low features early on load
git bisect old 909d87d347a7a5e08c32cbdb67bb2927fcefbf34
# new: [28eae0af65dcae887d3cd32212c702ee708c84be] Fix some typos found by codespell
git bisect new 28eae0af65dcae887d3cd32212c702ee708c84be
# new: [704ab2fce49fa404a61c6dac85003bcc1e3d0192] blockdev: Fix regression with the default naming of throttling groups
git bisect new 704ab2fce49fa404a61c6dac85003bcc1e3d0192
# new: [025c4e39f479eb498ee63b634d961a4cf357773e] s390x/ipl: fix reboots for migration from different bios
git bisect new 025c4e39f479eb498ee63b634d961a4cf357773e
# new: [82c85167791f0057752c2084f8480bf19401f314] Revert "virtio-net: unbreak self announcement and guest offloads after migration"
git bisect new 82c85167791f0057752c2084f8480bf19401f314
# first new commit: [82c85167791f0057752c2084f8480bf19401f314] Revert "virtio-net: unbreak self announcement and guest offloads after migration"
And the fixing qemu change being:
82c85167791f0057752c2084f8480bf19401f314 is the first new commit
commit 82c85167791f0057752c2084f8480bf19401f314
Author: Michael S. Tsirkin <email address hidden>
Date: Mon Jul 4 14:47:37 2016 +0300
Revert "virtio-net: unbreak self announcement and guest offloads after migration"
This reverts commit 1f8828ef573c83365b4a87a776daf8bcef1caa21.
Cc: <email address hidden>
Reported-by: Robin Geuze <email address hidden>
Tested-by: Robin Geuze <email address hidden>
Signed-off-by: Michael S. Tsirkin <email address hidden>
(cherry picked from commit 6c6668232e71b7cf7ff39fa1a7abf660c40f9cea)
Signed-off-by: Michael Roth <email address hidden>
Its backport needs to be bundled with another fix to actually work (the commit before).
I'll try to backport and prep a PPA with those fixes for Xenial
TL;DR: 00000:01. 0/virtio- net'
- a KVM guest with the kernel change as identified above
- works on Bionic host (kernel 4.15 / qemu 2.11 / libvirt 4.0)
- migrating on a Xenial host (kernel 4.4 / qemu 2.5 / libvirt 1.3.1) fails
VQ 0 size 0x100 Guest index 0x8101 inconsistent with Host index 0x81: delta 0x8080
error while loading state for instance 0x0 of device 'pci@8000000200
- not fixed in latest 4.19 kernel
- only failing on ppc64el (not x86) - maybe high/low word related
- qemu bisecting found a high/low word related virtio issue and fix in the 2.6 stable series that
Note: generated names are odd (hashes are ok), most 4.13 here are actually 4.14 in development.
GOOD v4.13 Mon Sep 10 10:03:38
BAD v4.14 Mon Sep 10 10:51:31
Step-1: 15d8ffc9 #1 Mon Sep 10 12:36:30 bad
Step-2: bafb0762 #2 Mon Sep 10 13:04:52 good
Step-3: b63f6044 #3 Mon Sep 10 13:24:27 bad
Step-4: e08af95d #4 Mon Sep 10 13:44:11 bad
Step-5: 2a493216 #5 Mon Sep 10 14:25:50 bad
Step-6: a248878d #6 Mon Sep 10 14:50:47 bad
Step-7: 160e22aa #7 Mon Sep 10 15:09:03 good
Step-8: 727f8914 #8 Mon Sep 10 18:30:06 good
Step-9: 4a3c67a6 #9 Mon Sep 10 20:37:37 bad
Step-10: 04584957 #10 Tue Sep 11 04:35:41 bad
Step-11: f7ce9103 #11 Tue Sep 11 05:30:50 bad
Step-12: 192f68cf #12 Tue Sep 11 05:49:50 good
Step-13: 3f93522f #13 Tue Sep 11 06:13:01 bad
Step-14: 4941d472 #14 Tue Sep 11 06:40:05 good
Offending change identified as: a36b57adf324a54 e674fc9536
commit 3f93522ffab2d46
Author: Jason Wang <email address hidden>
Date: Wed Jul 19 16:54:49 2017 +0800
virtio-net: switch off offloads on demand if possible on XDP set
Current XDP implementation wants guest offloads feature to be disabled
on device. This is inconvenient and means guest can't benefit from
offloads if XDP is not used. This patch tries to address this
limitation by disabling the offloads on demand through control guest
offloads. Guest offloads will be disabled and enabled on demand on XDP
set.
Signed-off-by: Jason Wang <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
To check if any commit in the latest kernel fixed the issue:
4.19-rc3 as of today (11da3a7f): bad
=> Not fixed yet as a guest kernel commit.
=> Also I don't see any further how we could fix hat on the kernel side, despite the issue being introduced there
Since we had the report that a Bionic Host would be ok I bumped the test env up one by one.
(in order)
Libvirt 1.3.1 -> 4.0: still bad
kernel 4.4 -> 4.15: still bad
qemu 2.5 -> 2.11: working
So we are actually looking for a qemu fix for a kernel introduced issue it seems.
Via UCA we can access some rather easily.
qemu 2.5 (X) bad
qemu 2.6.1 (Y) good
qemu 2.8 (Z) good
qemu 2.10 (A) good
qemu 2.11: (B) good
So a qemu bisect for 2.5->2.6 it shall be :-/ qemu/slof. bin seabios/ * /usr/share/qemu/ ipxe/qemu/ * /usr/share/qemu/
Back then this was still based on full debian versions so no bisect directly in the packaging repo on these old versions.
Using checkinstall and the configure line of the qemu yakkety version (reset machine type to upstream type and linking spapr-rtas.bin [qemu-slof] and others to the expected place).
ln -s /usr/share/slof/* /usr/share/
ln -s /usr/share/
ln -s /usr/lib/
But I realized that 2.6.0 was affected as well.
Maybe the fix was part of 2.6.1?
I checked the last 2.6.0 publish we had back in Yakkety and it failed as well.
So after all the bisect might be much smaller between 2.6.0 and its upstream stable branch. 6.0-bisect- start: old behavior (bad) 6.2-bisect- start: new behavior (good)
Verified start points with builds from git.
qemu-2.
qemu-2.
Eventually came down to: 772cd9b9af64bb2 5f88fba6567] Update version for 2.6.2 release 72cd9b9af64bb25 f88fba6567 5767d43845c15c7 9ac8fa6d6af] Update version for v2.6.0 release 767d43845c15c79 ac8fa6d6af bc187839b01a4b0 056617681a1] atapi: fix halted DMA reset c187839b01a4b00 56617681a1 df47c7640ba0bc2 4d7060ad307] vmsvga: shadow fifo registers f47c7640ba0bc24 d7060ad307 e08c32cbdb67bb2 927fcefbf34] virtio: set low features early on load 08c32cbdb67bb29 27fcefbf34 887d3cd32212c70 2ee708c84be] Fix some typos found by codespell 87d3cd32212c702 ee708c84be 04a61c6dac85003 bcc1e3d0192] blockdev: Fix regression with the default naming of throttling groups 4a61c6dac85003b cc1e3d0192 498ee63b634d961 a4cf357773e] s390x/ipl: fix reboots for migration from different bios 98ee63b634d961a 4cf357773e 57752c2084f8480 bf19401f314] Revert "virtio-net: unbreak self announcement and guest offloads after migration" 7752c2084f8480b f19401f314 57752c2084f8480 bf19401f314] Revert "virtio-net: unbreak self announcement and guest offloads after migration"
git bisect start
# new: [529d45e151d82a
git bisect new 529d45e151d82a7
# old: [bfc766d38e1fae
git bisect old bfc766d38e1fae5
# new: [ec211e742683d4
git bisect new ec211e742683d4b
# old: [71798fda8b6ef8
git bisect old 71798fda8b6ef8d
# old: [909d87d347a7a5
git bisect old 909d87d347a7a5e
# new: [28eae0af65dcae
git bisect new 28eae0af65dcae8
# new: [704ab2fce49fa4
git bisect new 704ab2fce49fa40
# new: [025c4e39f479eb
git bisect new 025c4e39f479eb4
# new: [82c85167791f00
git bisect new 82c85167791f005
# first new commit: [82c85167791f00
And the fixing qemu change being: 7752c2084f8480b f19401f314 is the first new commit 7752c2084f8480b f19401f314
82c85167791f005
commit 82c85167791f005
Author: Michael S. Tsirkin <email address hidden>
Date: Mon Jul 4 14:47:37 2016 +0300
Revert "virtio-net: unbreak self announcement and guest offloads after migration"
This reverts commit 1f8828ef573c833 65b4a87a776daf8 bcef1caa21.
Cc: <email address hidden> f7ff39fa1a7abf6 60c40f9cea)
Reported-by: Robin Geuze <email address hidden>
Tested-by: Robin Geuze <email address hidden>
Signed-off-by: Michael S. Tsirkin <email address hidden>
(cherry picked from commit 6c6668232e71b7c
Signed-off-by: Michael Roth <email address hidden>
Its backport needs to be bundled with another fix to actually work (the commit before).
I'll try to backport and prep a PPA with those fixes for Xenial