[live migration] The instance directory on the destination host is not clean up

Bug #1672597 reported by Dave Chen
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

I understand there are code to clean up the instance directory on the target host if the live migration failed, but the directory is not cleanup if libvirt's connection is timeout.

I haven't got a change to root cause the issue, but I feel the code could be optimized a little bit to avoid this issue.

Here is some trace log from my side.

- Libvirt connection timed out
2017-03-07 02:34:37.540 ERROR nova.virt.libvirt.driver [req-35bc9ca8-c77b-481c-ae3e-93bbfed6187f admin admin] [instance: 6714b056-4950-4e63-83d3-fc383e977a53] Live Migration failure: unable to connect to server at 'ceph-dev:49152': Connection timed out
2017-03-07 02:34:37.541 DEBUG nova.virt.libvirt.driver [req-35bc9ca8-c77b-481c-ae3e-93bbfed6187f admin admin] [instance: 6714b056-4950-4e63-83d3-fc383e977a53] Migration operation thread notification from (pid=18073) thread_finished /opt/stack/nova/nova/virt/libvirt/driver.py:6361
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 457, in fire_timers
    timer()
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 58, in __call__
    cb(*args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 168, in _do_send
    waiter.switch(result)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/opt/stack/nova/nova/utils.py", line 1066, in context_wrapper
    return func(*args, **kwargs)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5962, in _live_migration_operation
    instance=instance)
  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5958, in _live_migration_operation
    bandwidth=CONF.libvirt.live_migration_bandwidth)
  File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 605, in migrate
    flags=flags, bandwidth=bandwidth)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 186, in doit
    result = proxy_call(self._autowrap, f, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 144, in proxy_call
    rv = execute(f, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 125, in execute
    six.reraise(c, e, tb)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 83, in tworker
    rv = meth(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 1586, in migrateToURI2
    if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)

libvirtError: unable to connect to server at 'ceph-dev:49152': Connection timed out

- The instance's directory haven't cleanup, and the next migration will fail.

Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming
    res = self.dispatcher.dispatch(message)

  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch
    return self._do_dispatch(endpoint, method, ctxt, args)

  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch
    result = func(ctxt, **new_args)

  File "/opt/stack/nova/nova/exception_wrapper.py", line 75, in wrapped
    function_name, call_dict, binary)

  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()

  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)

  File "/opt/stack/nova/nova/exception_wrapper.py", line 66, in wrapped
    return f(self, context, *args, **kw)

  File "/opt/stack/nova/nova/compute/utils.py", line 613, in decorated_function
    return function(self, context, *args, **kwargs)

  File "/opt/stack/nova/nova/compute/manager.py", line 216, in decorated_function
    kwargs['instance'], e, sys.exc_info())

  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()

  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)

  File "/opt/stack/nova/nova/compute/manager.py", line 204, in decorated_function
    return function(self, context, *args, **kwargs)

  File "/opt/stack/nova/nova/compute/manager.py", line 5192, in pre_live_migration
    migrate_data)

  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 6474, in pre_live_migration
    raise exception.DestinationDiskExists(path=instance_dir)

DestinationDiskExists: The supplied disk path (/opt/stack/data/nova/instances/6714b056-4950-4e63-83d3-fc383e977a53) already exists, it is expected not to exist.

tags: added: live-migration
Revision history for this message
Timofey Durakov (tdurakov) wrote :

Hello, could you please specify Openstack version you are using and I assume that there is a ceph rbd as a backend? Is that right?
Marking this as incomplete for now

Changed in nova:
status: New → Incomplete
Changed in nova:
assignee: nobody → Dave Chen (wei-d-chen)
Sean Dague (sdague)
Changed in nova:
assignee: Dave Chen (wei-d-chen) → nobody
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.