Lee on IRC mentioned he hit something similar today again; copy/pasting his comment from this review[1]: Weird one, n-cpu (well the libvirt module) on the source (compute1) reports that libvirtd on the dest (controller) has refused the connection but the migration had already failed on the dest. This smells like a python-libvirt bug tbh, surely the migrateToURI3 call on the source should be able to detect the failure on the dest and return a useful error to the caller? ----------------------------------------------------------------------- https://zuul.opendev.org/t/openstack/build/f3b829801901417c9310ad5cc5a0e886/log/compute1/logs/screen-n-cpu.txt#11964 Jun 10 06:41:59.477301 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: DEBUG nova.virt.libvirt.driver [-] [instance: 0a929010-7361-4372-a785-f9eb052aca46] Migration operation thread notification {{(pid=67549) thread_finished /opt/stack/nova/nova/virt/libvirt/driver.py:9921}} Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: Traceback (most recent call last): Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/hubs/hub.py", line 476, in fire_timers Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: timer() Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/hubs/timer.py", line 59, in __call__ Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: cb(*args, **kw) Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/event.py", line 175, in _do_send Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: waiter.switch(result) Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/greenthread.py", line 221, in main Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: result = function(*args, **kwargs) Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/opt/stack/nova/nova/utils.py", line 654, in context_wrapper Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: return func(*args, **kwargs) Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 9576, in _live_migration_operation Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: LOG.error("Live Migration failure: %s", e, instance=instance) Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__ Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: self.force_reraise() Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: raise self.value Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 9564, in _live_migration_operation Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: guest.migrate(self._live_migration_uri(dest), Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 617, in migrate Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: self._domain.migrateToURI3( Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 190, in doit Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: result = proxy_call(self._autowrap, f, *args, **kwargs) Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 148, in proxy_call Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: rv = execute(f, *args, **kwargs) Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 129, in execute Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: six.reraise(c, e, tb) Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/six.py", line 719, in reraise Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: raise value Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/eventlet/tpool.py", line 83, in tworker Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: rv = meth(*args, **kwargs) Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: File "/usr/local/lib/python3.8/dist-packages/libvirt.py", line 2099, in migrateToURI3 Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: raise libvirtError('virDomainMigrateToURI3() failed') Jun 10 06:41:59.486684 ubuntu-focal-rax-dfw-0025050045 nova-compute[67549]: libvirt.libvirtError: unable to connect to server at 'ubuntu-focal-rax-dfw-0025050041:49152': Connection refused ----------------------------------------------------------------------- Anyway, the reason for the underlying migration failure isn't clear as QEMU and libvirtd on the dest log practically nothing aside from `failed`. Noice. ----------------------------------------------------------------------- https://zuul.opendev.org/t/openstack/build/f3b829801901417c9310ad5cc5a0e886/log/controller/logs/libvirt/libvirtd_log.txt 121258 2021-06-10 06:41:58.961+0000: 73381: info : qemuMonitorIOWrite:450 : QEMU_MONITOR_IO_WRITE: mon=0x7f5818059ad0 buf={"execute":"query-status","id":"libvirt-377"}^M [..] 121267 2021-06-10 06:41:58.962+0000: 73381: info : qemuMonitorJSONIOProcessLine:239 : QEMU_MONITOR_RECV_REPLY: mon=0x7f5818059ad0 reply={"return": {"status": "inmigrate", "singlestep": false, "running": false}, "id": "libvirt-377"} [..] 121270 2021-06-10 06:41:58.962+0000: 73383: debug : qemuProcessStop:7279 : Shutting down vm=0x7f5818022c40 name=instance-0000001d id=22 pid=131702, reason=failed, asyncJob=migration in, flags=0x1 [..] 121273 2021-06-10 06:41:58.963+0000: 73383: debug : qemuDomainLogAppendMessage:10691 : Append log message (vm='instance-0000001d' message='2021-06-10 06:41:58.963+0000: shutting down, reason=failed 121274 ) stdioLogD=1 [..] 121278 2021-06-10 06:41:58.963+0000: 73383: debug : qemuProcessKill:7197 : vm=0x7f5818022c40 name=instance-0000001d pid=131702 flags=0x5 https://zuul.opendev.org/t/openstack/build/f3b829801901417c9310ad5cc5a0e886/log/controller/logs/libvirt/qemu/instance-0000001d_log.txt 2021-06-10 06:41:58.963+0000: shutting down, reason=failed ----------------------------------------------------------------------- [1]https://review.opendev.org/c/openstack/nova/+/795533/5#message-3e7b1dc4d3f8d22b2c9a55637c7366199d00eb56