libvirt live migration fails intermittently in grenade live migration job with "error while loading state for instance 0x0 of device 'kvm-tpr-opt'"

Bug #1843058 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Undecided
Unassigned

Bug Description

This may be related to bug 1838309 but I'm not sure so I'm reporting it separately so we can track it in elastic-recheck. This is the traceback in the nova-compute logs:

Sep 06 01:28:11.837685 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: DEBUG nova.virt.libvirt.driver [None req-e6bcaa2e-aa66-4107-b0c6-9b3976d45c76 None None] [instance: 64689c1f-27b6-4889-8206-3bc458427197] Migration operation thread notification {{(pid=3855) thread_finished /opt/stack/old/nova/nova/virt/libvirt/driver.py:8039}}
Sep 06 01:28:11.838031 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: Traceback (most recent call last):
Sep 06 01:28:11.838031 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 460, in fire_timers
Sep 06 01:28:11.838282 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: timer()
Sep 06 01:28:11.838282 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 59, in __call__
Sep 06 01:28:11.838561 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: cb(*args, **kw)
Sep 06 01:28:11.838561 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 175, in _do_send
Sep 06 01:28:11.838774 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: waiter.switch(result)
Sep 06 01:28:11.838774 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 219, in main
Sep 06 01:28:11.839008 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: result = function(*args, **kwargs)
Sep 06 01:28:11.839008 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/opt/stack/old/nova/nova/utils.py", line 800, in context_wrapper
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: return func(*args, **kwargs)
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 7711, in _live_migration_operation
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: LOG.error("Live Migration failure: %s", e, instance=instance)
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: self.force_reraise()
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: six.reraise(self.type_, self.value, self.tb)
Sep 06 01:28:11.839688 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/opt/stack/old/nova/nova/virt/libvirt/driver.py", line 7704, in _live_migration_operation
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: bandwidth=CONF.libvirt.live_migration_bandwidth)
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/opt/stack/old/nova/nova/virt/libvirt/guest.py", line 682, in migrate
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: destination, params=params, flags=flags)
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 190, in doit
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: result = proxy_call(self._autowrap, f, *args, **kwargs)
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 148, in proxy_call
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: rv = execute(f, *args, **kwargs)
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 129, in execute
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: six.reraise(c, e, tb)
Sep 06 01:28:11.840435 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
Sep 06 01:28:11.841508 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: rv = meth(*args, **kwargs)
Sep 06 01:28:11.841508 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1745, in migrateToURI3
Sep 06 01:28:11.843021 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
Sep 06 01:28:11.843239 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: libvirtError: internal error: qemu unexpectedly closed the monitor: 2019-09-06T01:28:10.940146Z qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
Sep 06 01:28:11.843239 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 2019-09-06T01:28:11.706713Z qemu-system-x86_64: error while loading state for instance 0x0 of device 'kvm-tpr-opt'
Sep 06 01:28:11.843239 ubuntu-bionic-rax-iad-0010857269 nova-compute[3855]: 2019-09-06T01:28:11.707623Z qemu-system-x86_64: load of migration failed: Operation not permitted

This is the guest qemu log:

2019-09-06 01:27:54.606+0000: initiating migration
2019-09-06 01:27:55.378+0000: shutting down, reason=migrated
2019-09-06T01:27:55.379158Z qemu-system-x86_64: terminating on signal 15 from pid 32389 (/usr/sbin/libvirtd)
2019-09-06 01:28:10.699+0000: starting up libvirt version: 4.0.0, package: 1ubuntu8.12 (Marc Deslauriers <email address hidden> Tue, 02 Jul 2019 09:19:33 -0400), qemu version: 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.18), hostname: ubuntu-bionic-rax-iad-0010857256
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name guest=instance-00000021,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-29-instance-00000021/master-key.aes -machine pc-i440fx-2.11,accel=tcg,usb=off,dump-guest-core=off -m 64 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 64689c1f-27b6-4889-8206-3bc458427197 -smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=19.1.0,serial=64689c1f-27b6-4889-8206-3bc458427197,uuid=64689c1f-27b6-4889-8206-3bc458427197,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-29-instance-00000021/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -object secret,id=virtio-disk0-secret0,data=JSeIib66AdJhXaNumn38Lk8ElHKyPOTCOwjG2uQiX6Q=,keyid=masterKey0,iv=OwpOylx/6ARMhASECPtqwg==,format=base64 -drive 'file=rbd:vms/64689c1f-27b6-4889-8206-3bc458427197_disk:id=cinder:auth_supported=cephx\;none:mon_host=10.208.226.92\:6789,file.password-secret=virtio-disk0-secret0,format=raw,if=none,id=drive-virtio-disk0,cache=writeback' -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=29,id=hostnet0 -device virtio-net-pci,host_mtu=1400,netdev=hostnet0,id=net0,mac=fa:16:3e:6b:59:0f,bus=pci.0,addr=0x3 -add-fd set=1,fd=32 -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -vnc 0.0.0.0:1 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
2019-09-06T01:28:10.842062Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on: char device redirected to /dev/pts/1 (label charserial0)
2019-09-06T01:28:10.940146Z qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
2019-09-06T01:28:11.706713Z qemu-system-x86_64: error while loading state for instance 0x0 of device 'kvm-tpr-opt'
2019-09-06T01:28:11.707623Z qemu-system-x86_64: load of migration failed: Operation not permitted
2019-09-06 01:28:11.743+0000: shutting down, reason=failed

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22error%20while%20loading%20state%20for%20instance%5C%22%20AND%20message%3A%5C%22of%20device%20'kvm-tpr-opt'%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d

4 hits in 7 days so it seems pretty rare.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.