live migration intermittently fails in CI with "Connection refused" during guest transfer

Bug #1797185 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Low
Unassigned

Bug Description

Seen here:

http://logs.openstack.org/31/606031/4/check/nova-live-migration/9d106bb/logs/screen-n-cpu.txt.gz#_Oct_10_15_27_01_355106

Oct 10 15:27:01.355106 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: ERROR nova.virt.libvirt.driver [None req-e0b8f9fe-a9bf-4bb5-a90b-b0724401d082 None None] [instance: fbdce008-9227-4955-89c4-ba2189d98bed] Live Migration failure: unable to connect to server at 'ubuntu-xenial-ovh-gra1-0002837182:49152': Connection refused: libvirtError: unable to connect to server at 'ubuntu-xenial-ovh-gra1-0002837182:49152': Connection refused
Oct 10 15:27:01.355346 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: DEBUG nova.virt.libvirt.driver [None req-e0b8f9fe-a9bf-4bb5-a90b-b0724401d082 None None] [instance: fbdce008-9227-4955-89c4-ba2189d98bed] Migration operation thread notification {{(pid=15353) thread_finished /opt/stack/new/nova/nova/virt/libvirt/driver.py:7495}}
Oct 10 15:27:01.355532 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: Traceback (most recent call last):
Oct 10 15:27:01.355678 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 460, in fire_timers
Oct 10 15:27:01.356699 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: timer()
Oct 10 15:27:01.356889 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 59, in __call__
Oct 10 15:27:01.357495 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: cb(*args, **kw)
Oct 10 15:27:01.357784 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 175, in _do_send
Oct 10 15:27:01.358126 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: waiter.switch(result)
Oct 10 15:27:01.358295 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 219, in main
Oct 10 15:27:01.358654 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: result = function(*args, **kwargs)
Oct 10 15:27:01.358804 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/opt/stack/new/nova/nova/utils.py", line 799, in context_wrapper
Oct 10 15:27:01.361236 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: return func(*args, **kwargs)
Oct 10 15:27:01.361446 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 7151, in _live_migration_operation
Oct 10 15:27:01.361607 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: LOG.error("Live Migration failure: %s", e, instance=instance)
Oct 10 15:27:01.361756 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Oct 10 15:27:01.361884 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: self.force_reraise()
Oct 10 15:27:01.362032 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Oct 10 15:27:01.362205 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: six.reraise(self.type_, self.value, self.tb)
Oct 10 15:27:01.362344 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 7144, in _live_migration_operation
Oct 10 15:27:01.362490 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: bandwidth=CONF.libvirt.live_migration_bandwidth)
Oct 10 15:27:01.362717 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/opt/stack/new/nova/nova/virt/libvirt/guest.py", line 673, in migrate
Oct 10 15:27:01.362961 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: destination, params=params, flags=flags)
Oct 10 15:27:01.363162 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 190, in doit
Oct 10 15:27:01.363347 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: result = proxy_call(self._autowrap, f, *args, **kwargs)
Oct 10 15:27:01.363513 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 148, in proxy_call
Oct 10 15:27:01.363634 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: rv = execute(f, *args, **kwargs)
Oct 10 15:27:01.363750 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 129, in execute
Oct 10 15:27:01.363866 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: six.reraise(c, e, tb)
Oct 10 15:27:01.364005 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
Oct 10 15:27:01.364127 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: rv = meth(*args, **kwargs)
Oct 10 15:27:01.364275 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1745, in migrateToURI3
Oct 10 15:27:01.364626 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
Oct 10 15:27:01.364798 ubuntu-xenial-ovh-gra1-0002837152 nova-compute[15353]: libvirtError: unable to connect to server at 'ubuntu-xenial-ovh-gra1-0002837182:49152': Connection refused

There isn't really anything in the libvirtd logs or the guest logs about this on either of the compute nodes.

From the guest log on the destination compute:

http://logs.openstack.org/31/606031/4/check/nova-live-migration/9d106bb/logs/subnode-2/libvirt/qemu/instance-0000000c.txt.gz

2018-10-10T15:27:00.892778Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on: char device redirected to /dev/pts/1 (label charserial0)
2018-10-10T15:27:00.982751Z qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
2018-10-10 15:27:01.102+0000: shutting down, reason=failed
2018-10-10T15:27:01.102890Z qemu-system-x86_64: terminating on signal 15 from pid 18209 (/usr/sbin/libvirtd)

Looks like this is very rare:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Live%20Migration%20failure%3A%20unable%20to%20connect%20to%20server%20at%5C%22%20AND%20message%3A%5C%22Connection%20refused%3A%5C%22%20AND%20tags%3A%5C%22screen-n-cpu.txt%5C%22&from=7d

3 hits in 7 days. Maybe it's just a network hiccup on the nodes.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.