migrate server reporting list index out of bound

Bug #1881455 reported by norman shen on 2020-05-31
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Stephen Finucane

Bug Description

Description
============

When resize to local host enabled and do a cold migration sometimes fails with

1. migrating to same host failed
2. and then a list index out of bound error

Steps to reproduce
===================

deploy two compute nodes and make workload imbalance, for example compute01 has more allocations
than compute02. Then migrate server on compute02.

Expected result
================

cold migration succeeded

actual result
==============

sometimes failed

log
======

8084-4fa8-a3c4-2874555fb27c held by migration 0a8a29a5-7f9c-4af3-85a1-ea62ee5658c3 for instance
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [req-ee48014a-51c1-4e82-9ef3-e3b68a9a34e4 5f0b0ff35b914c84b24efb363965530d 0606e9bf4e9c4334b6cb9a5012c60fb8 - default default] [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] Error: Unable to migrate instance (
8189fa53-3e8a-42e3-a735-1d91b9ff0c3b) to current host (compute02).: UnableToMigrateToSelf: Unable to migrate instance (8189fa53-3e8a-42e3-a735-1d91b9ff0c3b) to current host (compute02).
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] Traceback (most recent call last):
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 4555, in prep_resize
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] node, migration, clean_shutdown)
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 4499, in _prep_resize
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] instance_id=instance.uuid, host=self.host)
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] UnableToMigrateToSelf: Unable to migrate instance (8189fa53-3e8a-42e3-a735-1d91b9ff0c3b) to current host (compute02).
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b]
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [req-ee48014a-51c1-4e82-9ef3-e3b68a9a34e4 5f0b0ff35b914c84b24efb363965530d 0606e9bf4e9c4334b6cb9a5012c60fb8 - default default] [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] Error: Unable to migrate instance (
8189fa53-3e8a-42e3-a735-1d91b9ff0c3b) to current host (compute02).: UnableToMigrateToSelf: Unable to migrate instance (8189fa53-3e8a-42e3-a735-1d91b9ff0c3b) to current host (compute02).
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] Traceback (most recent call last):
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 4555, in prep_resize
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] node, migration, clean_shutdown)
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 4499, in _prep_resize
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] instance_id=instance.uuid, host=self.host)
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] UnableToMigrateToSelf: Unable to migrate instance (8189fa53-3e8a-42e3-a735-1d91b9ff0c3b) to current host (compute02).
2020-05-31 02:55:51.649 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b]
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [req-ee48014a-51c1-4e82-9ef3-e3b68a9a34e4 5f0b0ff35b914c84b24efb363965530d 0606e9bf4e9c4334b6cb9a5012c60fb8 - default default] [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] Setting instance vm_state to ERROR:
 IndexError: list index out of range
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] Traceback (most recent call last):
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 8333, in _error_out_instance_on_exception
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] yield
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 4576, in prep_resize
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] filter_properties, host_list)
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 4637, in _reschedule_resize_or_reraise
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] tb=','.join(traceback.format_exception(*exc_info)))
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] File "/var/lib/openstack/lib/python2.7/site-packages/nova/rpc.py", line 231, in wrapped
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] return f(*args, **kwargs)
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/utils.py", line 442, in notify_about_instance_action
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] fault, priority = _get_fault_and_priority_from_exc_and_tb(exception, tb)
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/utils.py", line 422, in _get_fault_and_priority_from_exc_and_tb
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] exception, tb)
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] File "/var/lib/openstack/lib/python2.7/site-packages/nova/notifications/objects/exception.py", line 45, in from_exc_and_traceback
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] trace = inspect.trace()[-1]
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b] IndexError: list index out of range
2020-05-31 02:55:51.740 2419133 ERROR nova.compute.manager [instance: 8189fa53-3e8a-42e3-a735-1d91b9ff0c3b]

norman shen (jshen28) on 2020-06-01
summary: - migrate server reporting list index of out bound
+ migrate server reporting list index out of bound

Change abandoned by norman shen (<email address hidden>) on branch: master
Review: https://review.opendev.org/732003

Related fix proposed to branch: master
Review: https://review.opendev.org/733667

Changed in nova:
assignee: nobody → Stephen Finucane (stephenfinucane)
status: New → In Progress

Reviewed: https://review.opendev.org/733667
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=03b00ae02fede4ee7f347001f50baab1d79ffa0a
Submitter: Zuul
Branch: master

commit 03b00ae02fede4ee7f347001f50baab1d79ffa0a
Author: Stephen Finucane <email address hidden>
Date: Thu Jun 4 12:10:09 2020 +0100

    Add reproducer for bug #1881455

    The 'nova.compute.manager._reschedule_resize_or_reraise' function can
    end up calling 'from_exc_and_traceback' class method of the
    'ExecutionPayload' versioned notification object via the following call
    stack:

      nova.compute.manager._reschedule_resize_or_reraise
        nova.compute.utils.notify_about_instance_action
          nova.compute._get_fault_and_priority_from_exc_and_tb
            nova.notification.objects.exception.ExceptionPayload.from_exc_and_traceback

    The 'from_exc_and_traceback' class method uses 'inspect.trace()' to get
    more information about the provided execution in order to report
    information such as module and function name of the function raising the
    exception in the notification. 'inspect.trace()' must be called inside
    the context of an exception handler otherwise it returns an empty list.
    However, we are using '_reschedule_resize_or_reraise' to re-raise a
    previously raised and captured exception, which means we're not
    executing from such a context. This results in the following warning:

      IndexError: list index out of range

    A future change will resolve this but for now, prove the issue.

    Change-Id: I5baaa698c2627a3438eb1d9990eb8091f37253ca
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-Bug: #1881455

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/733668
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=125df26bf9d6b4cfbfb68770fa47f2055f29b8dc
Submitter: Zuul
Branch: master

commit 125df26bf9d6b4cfbfb68770fa47f2055f29b8dc
Author: Stephen Finucane <email address hidden>
Date: Thu Jun 4 12:13:56 2020 +0100

    Use 'Exception.__traceback__' for versioned notifications

    The 'inspect.trace()' function is expected to be called within the
    context of an exception handler. The 'from_exc_and_traceback' class
    method of the 'nova.notification.objects.exception.ExceptionPayload'
    class uses this to get information about a provided exception, however,
    there are cases where this is called from outside of an exception
    handler. In these cases, we see an 'IndexError' since we can't get the
    last frame of a non-existent stacktrace. The solution to this is to
    fallback to using the traceback embedded in the exception. This is a bit
    lossy when decorators are involved but for all other cases this will
    give us the same information. This also allows us to avoid passing a
    traceback argument to the function since we have it to hand already.

    Change-Id: I404ca316b1bf2a963106cd34e927934befbd9b12
    Signed-off-by: Stephen Finucane <email address hidden>
    Closes-Bug: #1881455

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers