Bug #1761798 “live migration intermittently fails in CI with “VQ...” : Bugs : OpenStack Compute (nova)

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-04-06:

#1

Maybe when we move to use the Queens UCA we'll have better luck with this:

https://review.openstack.org/#/c/554314/

That has these package versions:

ii libvirt-bin 4.0.0-1ubuntu4~cloud0

ii qemu-system-x86 1:2.11+dfsg-1ubuntu2~cloud0

Revision history for this message

Kashyap Chamarthy (kashyapc) wrote on 2018-04-06:

#2

The offending instance QEMU from the log that crashed:

http://logs.openstack.org/37/522537/20/check/legacy-tempest-dsvm-multinode-live-migration/8de6e74/logs/subnode-2/libvirt/qemu/instance-00000002.txt.gz

----------------------------------------------------------------------------
2018-04-05 21:48:38.136+0000: starting up libvirt version: 3.6.0, package: 1ubuntu6.2~cloud0 (Openstack Ubuntu Testing Bot <email address hidden> Wed, 07 Feb 2018 20:05:24 +0000), qemu version: 2.10.1(Debian 1:2.10+dfsg-0ubuntu3.5~cloud0), hostname: ubuntu-xenial-ovh-bhs1-0003361707
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name guest=instance-00000002,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-instance-00000002/master-key.aes -machine pc-i440fx-artful,accel=tcg,usb=off,dump-guest-core=off -m 64 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid e70df34f-395e-4482-9c24-101f12cf635d -smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=17.0.0,serial=97e1e00c-8afb-4356-b55e-92e997c5a1a7,uuid=e70df34f-395e-4482-9c24-101f12cf635d,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-instance-00000002/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/stack/data/nova/instances/e70df34f-395e-4482-9c24-101f12cf635d/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=29,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:15:ea:59,bus=pci.0,addr=0x3 -add-fd set=1,fd=32 -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
2018-04-05T21:48:38.205752Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on: char device redirected to /dev/pts/0 (label charserial0)
warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
2018-04-05T21:48:43.153268Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0x12c inconsistent with Host index 0x134: delta 0xfff8
2018-04-05T21:48:43.153288Z qemu-system-x86_64: Failed to load virtio-blk:virtio
2018-04-05T21:48:43.153292Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-blk'
2018-04-05T21:48:43.153347Z qemu-system-x86_64: load of migration failed: Operation not permitted
2018-04-05 21:48:43.198+0000: shutting down, reason=crashed
----------------------------------------------------------------------------

The offending instance QEMU from the log that crashed:

http://logs.openstack.org/37/522537/20/check/legacy-tempest-dsvm-multinode-live-migration/8de6e74/logs/subnode-2/libvirt/qemu/instance-00000002.txt.gz

----------------------------------------------------------------------------
2018-04-05 21:48:38.136+0000: starting up libvirt version: 3.6.0, package: 1ubuntu6.2~cloud0 (Openstack Ubuntu Testing Bot <openstack-testing-bot@ubuntu.com> Wed, 07 Feb 2018 20:05:24 +0000), qemu version: 2.10.1(Debian 1:2.10+dfsg-0ubuntu3.5~cloud0), hostname: ubuntu-xenial-ovh-bhs1-0003361707
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name guest=instance-00000002,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-instance-00000002/master-key.aes -machine pc-i440fx-artful,accel=tcg,usb=off,dump-guest-core=off -m 64 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid e70df34f-395e-4482-9c24-101f12cf635d -smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=17.0.0,serial=97e1e00c-8afb-4356-b55e-92e997c5a1a7,uuid=e70df34f-395e-4482-9c24-101f12cf635d,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-instance-00000002/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/stack/data/nova/instances/e70df34f-395e-4482-9c24-101f12cf635d/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=29,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:15:ea:59,bus=pci.0,addr=0x3 -add-fd set=1,fd=32 -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
2018-04-05T21:48:38.205752Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on: char device redirected to /dev/pts/0 (label charserial0)
warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
2018-04-05T21:48:43.153268Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0x12c inconsistent with Host index 0x134: delta 0xfff8
2018-04-05T21:48:43.153288Z qemu-system-x86_64: Failed to load virtio-blk:virtio
2018-04-05T21:48:43.153292Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-blk'
2018-04-05T21:48:43.153347Z qemu-system-x86_64: load of migration failed: Operation not permitted
2018-04-05 21:48:43.198+0000: shutting down, reason=crashed
----------------------------------------------------------------------------

Revision history for this message

Kashyap Chamarthy (kashyapc) wrote on 2018-04-06:

#3

Destination libvirt log with debug filters Edit (2.8 MiB, text/plain)

Revision history for this message

Dr. David Alan Gilbert (dgilbert-h) wrote on 2018-04-09:

#4

I'm not sure that's the same as the bz 1450524 - in this case it's virtio-blk, where I *think* that bz was tracked down to a virtio-balloon and a virtio-net issue.

Revision history for this message

melanie witt (melwitt) wrote on 2018-05-15:

#5

Hm, looks like the e-r query [1] for this bug doesn't find it if the screen-n-cpu.txt is on the subnode-2:

http://logs.openstack.org/25/566425/4/gate/nova-live-migration/27afb39/logs/subnode-2/screen-n-cpu.txt.gz?level=TRACE#_May_15_00_54_26_234161

[1] https://github.com/openstack-infra/elastic-recheck/blob/master/queries/1761798.yaml

Revision history for this message

Dmitry Sutyagin (dsutyagin) wrote on 2018-05-29:

#6

Ran into a similar issue, when snapshotting an instance, it won't start afterwards.
Root cause is inconsistency between state of device virtio-scsi and the saved VM RAM state in /var/lib/libvirt/qemu/save/
Removing instance-0000****.save file from thsi folder allows starting the VM back but may cause data corruption.
Specifics of my case - using virtio-scsi driver and discard enabled, backend is Ceph. Impacted 2 Windows VMs so far.

Revision history for this message

Dmitry Sutyagin (dsutyagin) wrote on 2018-05-29:

#7

I also think this is a qemu bug, will do some tests, try to reproduce.

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2020-11-23:

#8

We hit a very similar issue https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/subnode-2/screen-n-cpu.txt?severity=0#13482

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2020-11-23:

#9

Download full text (10.6 KiB)

Source node
https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/subnode-2/libvirt/qemu/instance-0000001a.txt

2020-11-20 14:25:24.887+0000: starting up libvirt version: 6.0.0, package: 0ubuntu8.4~cloud0 (Openstack Ubuntu Testing Bot <email address hidden> Tue, 15 Sep 2020 20:36:28 +0000), qemu version: 4.2.1Debian 1:4.2-3ubuntu6.7~cloud0, kernel: 4.15.0-124-generic, hostname: ubuntu-bionic-ovh-bhs1-0021872195

LC_ALL=C \

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \

HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a \

XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a/.local/share \

XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a/.cache \

XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a/.config \

QEMU_AUDIO_DRV=none \

/usr/bin/qemu-system-x86_64 \

-name guest=instance-0000001a,debug-threads=on \

-S \

-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-19-instance-0000001a/master-key.aes \

-machine pc-i440fx-4.2,accel=tcg,usb=off,dump-guest-core=off \

-cpu qemu64,hypervisor=on,lahf-lm=on \

-m 128 \

-overcommit mem-lock=off \

-smp 1,sockets=1,cores=1,threads=1 \

-uuid 2c468d92-4b19-426a-8c25-16b4624c21a4 \

-smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=22.1.0,serial=2c468d92-4b19-426a-8c25-16b4624c21a4,uuid=2c468d92-4b19-426a-8c25-16b4624c21a4,family=Virtual Machine' \

-no-user-config \

-nodefaults \

-chardev socket,id=charmonitor,fd=35,server,nowait \

-mon chardev=charmonitor,id=monitor,mode=control \

-rtc base=utc \

-no-shutdown \

-boot strict=on \

-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \

-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/_base/61bd5e531ab4c82456aa5300ede7266b3610be79","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \

-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/2c468d92-4b19-426a-8c25-16b4624c21a4/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \

-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \

-netdev tap,fd=37,id=hostnet0 \

-device virtio-net-pci,host_mtu=1400,netdev=hostnet0,id=net0,mac=fa:16:3e:43:11:f4,bus=pci.0,addr=0x3 \

-add-fd set=2,fd=39 \

-chardev pty,id=charserial0,logfile=/dev/fdset/2,logappend=on \

-device isa-serial,chardev=charserial0,id=serial0 \

-vnc 127.0.0.1:0 \

-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \

-incoming defer \

-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \

-object rng-random,id=objrng0,filename=/dev/urandom \

-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 \

-san...

Source node 
https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/subnode-2/libvirt/qemu/instance-0000001a.txt

2020-11-20 14:25:24.887+0000: starting up libvirt version: 6.0.0, package: 0ubuntu8.4~cloud0 (Openstack Ubuntu Testing Bot <openstack-testing-bot@ubuntu.com> Tue, 15 Sep 2020 20:36:28 +0000), qemu version: 4.2.1Debian 1:4.2-3ubuntu6.7~cloud0, kernel: 4.15.0-124-generic, hostname: ubuntu-bionic-ovh-bhs1-0021872195

LC_ALL=C \

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \

HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a \

XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a/.local/share \

XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a/.cache \

XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a/.config \

QEMU_AUDIO_DRV=none \

/usr/bin/qemu-system-x86_64 \

-name guest=instance-0000001a,debug-threads=on \

-S \

-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-19-instance-0000001a/master-key.aes \

-machine pc-i440fx-4.2,accel=tcg,usb=off,dump-guest-core=off \

-cpu qemu64,hypervisor=on,lahf-lm=on \

-m 128 \

-overcommit mem-lock=off \

-smp 1,sockets=1,cores=1,threads=1 \

-uuid 2c468d92-4b19-426a-8c25-16b4624c21a4 \

-smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=22.1.0,serial=2c468d92-4b19-426a-8c25-16b4624c21a4,uuid=2c468d92-4b19-426a-8c25-16b4624c21a4,family=Virtual Machine' \

-no-user-config \

-nodefaults \

-chardev socket,id=charmonitor,fd=35,server,nowait \

-mon chardev=charmonitor,id=monitor,mode=control \

-rtc base=utc \

-no-shutdown \

-boot strict=on \

-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \

-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/_base/61bd5e531ab4c82456aa5300ede7266b3610be79","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \

-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/2c468d92-4b19-426a-8c25-16b4624c21a4/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \

-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \

-netdev tap,fd=37,id=hostnet0 \

-device virtio-net-pci,host_mtu=1400,netdev=hostnet0,id=net0,mac=fa:16:3e:43:11:f4,bus=pci.0,addr=0x3 \

-add-fd set=2,fd=39 \

-chardev pty,id=charserial0,logfile=/dev/fdset/2,logappend=on \

-device isa-serial,chardev=charserial0,id=serial0 \

-vnc 127.0.0.1:0 \

-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \

-incoming defer \

-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \

-object rng-random,id=objrng0,filename=/dev/urandom \

-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 \

-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \

-msg timestamp=on

char device redirected to /dev/pts/2 (label charserial0)

virtio: bogus descriptor or out of resources

2020-11-20 14:25:39.911+0000: initiating migration

2020-11-20T14:26:21.409517Z qemu-system-x86_64: terminating on signal 15 from pid 17395 (/usr/sbin/libvirtd)

2020-11-20 14:26:21.610+0000: shutting down, reason=destroyed

Target node
https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/libvir
t/qemu/instance-0000001a.txt

2020-11-20 14:25:11.589+0000: starting up libvirt version: 6.0.0, package: 0ubuntu8.4~cloud0 (Openstack Ubuntu Testing Bot <openstack-testing-bot@ubuntu.com> Tue, 15 Sep 2020 20:36:28 +0000), qemu version: 4.2.1Debian 1:4.2-3ubuntu6.7~cloud0, kernel: 4.15.0-124-generic, hostname: ubuntu-bionic-ovh-bhs1-0021872194

LC_ALL=C \

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \

HOME=/var/lib/libvirt/qemu/domain-10-instance-0000001a \

XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-10-instance-0000001a/.local/share \

XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-10-instance-0000001a/.cache \

XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-10-instance-0000001a/.config \

QEMU_AUDIO_DRV=none \

/usr/bin/qemu-system-x86_64 \

-name guest=instance-0000001a,debug-threads=on \

-S \

-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-instance-0000001a/master-key.aes \

-machine pc-i440fx-4.2,accel=tcg,usb=off,dump-guest-core=off \

-cpu qemu64 \

-m 128 \

-overcommit mem-lock=off \

-smp 1,sockets=1,cores=1,threads=1 \

-uuid 2c468d92-4b19-426a-8c25-16b4624c21a4 \

-smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=22.1.0,serial=2c468d92-4b19-426a-8c25-16b4624c21a4,uuid=2c468d92-4b19-426a-8c25-16b4624c21a4,family=Virtual Machine' \

-no-user-config \

-nodefaults \

-chardev socket,id=charmonitor,fd=32,server,nowait \

-mon chardev=charmonitor,id=monitor,mode=control \

-rtc base=utc \

-no-shutdown \

-boot strict=on \

-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \

-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/_base/61bd5e531ab4c82456aa5300ede7266b3610be79","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \

-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/2c468d92-4b19-426a-8c25-16b4624c21a4/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \

-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \

-netdev tap,fd=34,id=hostnet0 \

-device virtio-net-pci,host_mtu=1400,netdev=hostnet0,id=net0,mac=fa:16:3e:43:11:f4,bus=pci.0,addr=0x3 \

-add-fd set=2,fd=36 \

-chardev pty,id=charserial0,logfile=/dev/fdset/2,logappend=on \

-device isa-serial,chardev=charserial0,id=serial0 \

-vnc 0.0.0.0:0 \

-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \

-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \

-object rng-random,id=objrng0,filename=/dev/urandom \

-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 \

-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \

-msg timestamp=on

char device redirected to /dev/pts/0 (label charserial0)

2020-11-20 14:25:25.637+0000: initiating migration

2020-11-20 14:25:26.776+0000: shutting down, reason=migrated

2020-11-20T14:25:26.777394Z qemu-system-x86_64: terminating on signal 15 from pid 31113 (/usr/sbin/libvirtd)

2020-11-20 14:25:38.909+0000: starting up libvirt version: 6.0.0, package: 0ubuntu8.4~cloud0 (Openstack Ubuntu Testing Bot <openstack-testing-bot@ubuntu.com> Tue, 15 Sep 2020 20:36:28 +0000), qemu version: 4.2.1Debian 1:4.2-3ubuntu6.7~cloud0, kernel: 4.15.0-124-generic, hostname: ubuntu-bionic-ovh-bhs1-0021872194

LC_ALL=C \

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \

HOME=/var/lib/libvirt/qemu/domain-13-instance-0000001a \

XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-13-instance-0000001a/.local/share \

XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-13-instance-0000001a/.cache \

XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-13-instance-0000001a/.config \

QEMU_AUDIO_DRV=none \

/usr/bin/qemu-system-x86_64 \

-name guest=instance-0000001a,debug-threads=on \

-S \

-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-13-instance-0000001a/master-key.aes \

-machine pc-i440fx-4.2,accel=tcg,usb=off,dump-guest-core=off \

-cpu qemu64,hypervisor=on,lahf-lm=on \

-m 128 \

-overcommit mem-lock=off \

-smp 1,sockets=1,cores=1,threads=1 \

-uuid 2c468d92-4b19-426a-8c25-16b4624c21a4 \

-smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=22.1.0,serial=2c468d92-4b19-426a-8c25-16b4624c21a4,uuid=2c468d92-4b19-426a-8c25-16b4624c21a4,family=Virtual Machine' \

-no-user-config \

-nodefaults \

-chardev socket,id=charmonitor,fd=34,server,nowait \

-mon chardev=charmonitor,id=monitor,mode=control \

-rtc base=utc \

-no-shutdown \

-boot strict=on \

-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \

-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/_base/61bd5e531ab4c82456aa5300ede7266b3610be79","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \

-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/2c468d92-4b19-426a-8c25-16b4624c21a4/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \

-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \

-netdev tap,fd=36,id=hostnet0 \

-device virtio-net-pci,host_mtu=1400,netdev=hostnet0,id=net0,mac=fa:16:3e:43:11:f4,bus=pci.0,addr=0x3 \

-add-fd set=2,fd=38 \

-chardev pty,id=charserial0,logfile=/dev/fdset/2,logappend=on \

-device isa-serial,chardev=charserial0,id=serial0 \

-vnc 0.0.0.0:0 \

-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \

-incoming defer \

-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \

-object rng-random,id=objrng0,filename=/dev/urandom \

-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 \

-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \

-msg timestamp=on

char device redirected to /dev/pts/1 (label charserial0)

2020-11-20T14:25:40.720757Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0xb8 inconsistent with Host index 0xe0: delta 0xffd8

2020-11-20T14:25:40.720785Z qemu-system-x86_64: Failed to load virtio-blk:virtio

2020-11-20T14:25:40.720790Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-blk'

2020-11-20T14:25:40.720824Z qemu-system-x86_64: load of migration failed: Operation not permitted

2020-11-20 14:25:40.778+0000: shutting down, reason=failed

Revision history for this message

sean mooney (sean-k-mooney) wrote on 2020-11-23:

#10

in the new case we have the same qemu/libvirt version in both nodes

4.2.1Debian 1:4.2-3ubuntu6.7~cloud0

and in this cae its failng for the virtio-blk device not the memroy ballon

before the migration starts we see

-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/2 (label charserial0)
virtio: bogus descriptor or out of resources
2020-11-20 14:25:39.911+0000: initiating migration

on the destination
https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/subnode-2/libvirt/qemu/instance-0000001a.txt

then on the source we see

char device redirected to /dev/pts/1 (label charserial0)
2020-11-20T14:25:40.720757Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0xb8 inconsistent with Host index 0xe0: delta 0xffd8
2020-11-20T14:25:40.720785Z qemu-system-x86_64: Failed to load virtio-blk:virtio
2020-11-20T14:25:40.720790Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-blk'
2020-11-20T14:25:40.720824Z qemu-system-x86_64: load of migration failed: Operation not permitted

when it fails
https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/libvirt/qemu/instance-0000001a.txt

Revision history for this message

Kashyap Chamarthy (kashyapc) wrote on 2020-11-23:

#11

[Copy/pasting my comment from here: https://bugs.launchpad.net/nova/+bug/1737625/comments/4]

I just talked to Dave Gilbert from upstream QEMU. Overall, as I implied in comment#2, this gnarly issue requires specialized debugging, digging deep into the bowels of QEMU, 'virtio-blk' and 'virtio.

That said, Dave notes that we get this "guest index inconsistent" error when the migrated RAM is inconsistent with the migrated 'virtio' device state. And a common case is where a 'virtio' device does an operation after the vCPU is stopped and after RAM has been transmitted.

Dave makes some guesswork of a potential scenario where this can occur:

  - Guest is running
  - ... live migration starts
  - ... a "block read" request gets submitteed
  - ... live migration stops the vCPUs, finishes transmitting RAM
  - ... the "block read" completes, 'virtio-blk' updates pointers
  - ... live migration "serializes" the 'virito-blk' state

So the "guest index inconsistent" state would only happen if you got unlucky with the timing of that read.

Another possibility, Dave points out, is that the guest has screwed up the device state somehow; the migration code in 'virtio' checks the state a lot. We have ruled this possibility out becausethe guest is just a garden-variety CirrOS instance idling; nothing special about it.

Revision history for this message

Dr. David Alan Gilbert (dgilbert-h) wrote on 2020-11-24:

#12

hang on, I've just noticed the :

char device redirected to /dev/pts/2 (label charserial0)
virtio: bogus descriptor or out of resources
2020-11-20 14:25:39.911+0000: initiating migration

I've never seen that 'bogus descriptor or out of resources' one before; the fact that's happening before the migration starts is suspicious that something has already gone wrong before migration started.
Is that warning present in all these failures?

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2020-11-25:

#13

I see two recent hits we have still logs for.

The one described in comment #10

And another but there the error message is a bit different:

on the migration source host:
: https://6f0be18d925d64906a23-689ad0b9b6f06bc0c51bfb99bf86ea04.ssl.cf5.rackcdn.com/698706/4/check/nova-grenade-multinode/ee2dbea/logs/libvirt/qemu/instance-0000001b.txt

char device redirected to /dev/pts/0 (label charserial0)
virtio: zero sized buffers are not allowed
2020-11-23 22:20:54.297+0000: initiating migration

on the migration destination
https://6f0be18d925d64906a23-689ad0b9b6f06bc0c51bfb99bf86ea04.ssl.cf5.rackcdn.com/698706/4/check/nova-grenade-multinode/ee2dbea/logs/subnode-2/libvirt/qemu/instance-0000001b.txt

char device redirected to /dev/pts/0 (label charserial0)
2020-11-23T22:20:55.129189Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0x62 inconsistent with Host index 0xa1: delta 0xffc1
2020-11-23T22:20:55.129230Z qemu-system-x86_64: Failed to load virtio-blk:virtio
2020-11-23T22:20:55.129241Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-blk'
2020-11-23T22:20:55.129259Z qemu-system-x86_64: load of migration failed: Operation not permitted

Revision history for this message

Dr. David Alan Gilbert (dgilbert-h) wrote on 2020-11-25:

#14

OK, but that still says in both cases here we've got a virtio error telling us that the queues are broken before migration even starts; so we should try and figure out why that's happening first.

Revision history for this message

Thomas Huth (th-huth) wrote on 2021-05-10:

#15

Is this still an issue with the latest release of QEMU (v6.0)?

Changed in qemu:
status:	New → Incomplete

Revision history for this message

ignazio (cassano) wrote on 2021-05-19:

#16

I got same issue on centos 7 stein

Revision history for this message

ignazio (cassano) wrote on 2021-05-19:

#17

Hello, some news ....I wonder if they can help:
I am testing with some virtual machine again.
If I follows this steps it works (but I lost network connection):

1) Detach network interface from instance
2) Attach network interface to instance
3) Migrate instance
4) Loggin into instance using console and restart networking

while if I restart networking before live migration it does not work.
So, when someone mentioned

########################
we get this "guest index inconsistent" error when the migrated RAM is inconsistent with the migrated 'virtio' device state. And a common case is where a 'virtio' device does an operation after the vCPU is stopped and after RAM has been transmitted.
#############################à
the network traffic could be the problem ?
Ignazio

Revision history for this message

Dr. David Alan Gilbert (dgilbert-h) wrote on 2021-05-19:

#18

Be careful, it might not be the same bug.

Yes, it *shouldn't* be a problem, but if the virtio code in qemu is broken then it will keep accepting incoming packets even when the guest is stopped in the final part of the migration and you get the contents of the RAM taken before the reception ofthe packet, but hte virtio state that's in the migration stream after the reception of the packet, and it's inconsistent.

But the case the other reporter mentioned is on a virtio-blk device; the same thing can happen if the storage device stalls/is slow during the migration code - i.e. a block read takes ages to complete and happens to complete after the point it should have stopped for migration.

Revision history for this message

Thomas Huth (th-huth) wrote on 2021-08-25:

#19

Is this still happening with the latest release?

Changed in nova:
status:	Confirmed → Incomplete

Revision history for this message

Launchpad Janitor (janitor) wrote on 2021-10-25:

#20

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status:	Incomplete → Expired

Revision history for this message

Launchpad Janitor (janitor) wrote on 2021-10-25:

#21

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status:	Incomplete → Expired

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Expired	Medium	Unassigned
	QEMU	Expired	Undecided	Unassigned

OpenStack Compute (nova)

live migration intermittently fails in CI with "VQ 0 size 0x80 Guest index 0x12c inconsistent with Host index 0x134: delta 0xfff8"

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches