Post live migration step could fail due to auth errors

Bug #1647451 reported by Timofey Durakov
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Timofey Durakov
Newton
Fix Committed
Medium
Lee Yarwood

Bug Description

Description
===========
When live migration is finished it's possible that keystone auth token is already expired,
that causes for post_step to fail

Steps to reproduce
==================
there are 2 options to reproduce this issue:
1. run live-migration of heavy loaded instance, wait for token to expire, and after that try to execute live-migration-force-complete
2. set a breakpoint in _post_live_migration method of compute manager, once breakpoint is reached,
do openstack token revoke, continue nova execution normally

Expected result
===============
live-migration to be finished sucessfully

Actual result
=============
post step is failed, overall migration is also failed

Environment
===========
1. I've tested this case on Newton version, but the issue should be valid for master branch too.

2. Libvirt + kvm

2. Ceph

3. Neutron vxlan

Changed in nova:
assignee: nobody → Timofey Durakov (tdurakov)
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/407147

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Timofey Durakov (tdurakov) wrote :
Download full text (9.3 KiB)

Tracebacks for the bug:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/eventlet/queue.py", line 118, in switch
    self.greenlet.switch(value)
  File "/usr/lib/python2.7/dist-packages/nova/utils.py", line 1182, in context_wrapper
    func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5354, in dispatch_live_migration
    self._do_live_migration(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5333, in _do_live_migration
    self._set_migration_status(migration, 'error')
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5326, in _do_live_migration
    block_migration, migrate_data)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5856, in live_migration
    migrate_data)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6573, in _live_migration [0/416]
    dom, finish_event, disk_paths)
  File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 6491, in _live_migration_monitor
    migrate_data)
  File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 110, in wrapped
    payload)
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 89, in wrapped
    return f(self, context, *args, **kw)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 387, in decorated_function
    kwargs['instance'], e, sys.exc_info())
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 375, in decorated_function
    return function(self, context, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 5493, in _post_live_migration
    network_info = self.network_api.get_instance_nw_info(ctxt, instance)
  File "/usr/lib/python2.7/dist-packages/nova/network/base_api.py", line 253, in get_instance_nw_info
    result = self._get_instance_nw_info(context, instance, **kwargs)
  File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 934, in _get_instance_nw_info
    preexisting_port_ids)
  File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 1737, in _build_network_info_model
    context, instance, networks, port_ids)
  File "/usr/lib/python2.7/dist-packages/nova/network/neutronv2/api.py", line 957,...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/407147
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4a5ecf1e29c3bdbb022f98a5fba41d4e7df56d88
Submitter: Jenkins
Branch: master

commit 4a5ecf1e29c3bdbb022f98a5fba41d4e7df56d88
Author: Timofey Durakov <email address hidden>
Date: Thu Dec 1 19:03:24 2016 +0300

    fix for auth during live-migration

    Post step could fail due to auth token expiration.
    get_instance_nw_info fails with authentication required,
    because there are several calls to neutron api, some of them
    are admin context, while others try to use token from request
    context. This patch ensure that if admin context is initially used,
    all subsequent calls will use the same initialized client

    Closes-Bug: #1647451

    Change-Id: I8962a9cd472cbbb5b9b67c5b164ff29fd8f5558a

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/410618

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.0.0b2

This issue was fixed in the openstack/nova 15.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/410618
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8e097039e73663689b4c630589ab2dd5a569b5de
Submitter: Jenkins
Branch: stable/newton

commit 8e097039e73663689b4c630589ab2dd5a569b5de
Author: Timofey Durakov <email address hidden>
Date: Thu Dec 1 19:03:24 2016 +0300

    fix for auth during live-migration

    Post step could fail due to auth token expiration.
    get_instance_nw_info fails with authentication required,
    because there are several calls to neutron api, some of them
    are admin context, while others try to use token from request
    context. This patch ensure that if admin context is initially used,
    all subsequent calls will use the same initialized client

    Closes-Bug: #1647451

    Change-Id: I8962a9cd472cbbb5b9b67c5b164ff29fd8f5558a
    (cherry picked from commit 4a5ecf1e29c3bdbb022f98a5fba41d4e7df56d88)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.5

This issue was fixed in the openstack/nova 14.0.5 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/669867

Revision history for this message
huanhongda (hongda) wrote :

post_live_migration_at_destination will fail because in setup_networks_on_host there is a port-list call to neutron api using token from request context.

https://review.opendev.org/gitweb?p=openstack/nova.git;a=blob;f=nova/network/neutronv2/api.py;h=986a9c6b777aef9a2dbf81d0e13afc620165aa49;hb=refs/heads/master#l346

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Balazs Gibizer (<email address hidden>) on branch: master
Review: https://review.opendev.org/669867
Reason: This is an old patch in merge conflict. So I'm abandoning it. If you would like to continue working on this please restore it or ping me (gibi) on IRC to restore it.

Revision history for this message
Laszlo Budai (laszlo-budai) wrote :

Hi all,

we have an old OpenStack in production, and ran into this issue. How can we recover the status of the VM as seen in OpenStack?

Openstack still reports the VM being on the source node, however we can see thet is is actually running on the destination.

Revision history for this message
Laszlo Budai (laszlo-budai) wrote :

For those who may come across this document. We were able to restore the status by updating the data stored in mysql. The reference document we've used: https://access.redhat.com/solutions/2070503

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.