Comment 6 for bug 1777157

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I've managed to push the code towards the same code path that the logs shows. I used 17.0.2 queens as that matched the line numbers in my log with the numbers in the log attached. I used 2.7 and 3.5 nova functional env for the reproduction.

* define two compute hosts (host1, host2)
* boot an instance on host1
* make sure that host2 is less desirable for the scheduler as a migration target by consuming resource from it but keep enough resources that it can be an allocation candidate still
* make sure that allow_resize_to_same_host = True so that scheduler will also consider host1 as a migration target
* make sure that the virt driver on host1 does not have the capability to migrate to the same host
* call migrate without forcing a host

Now the following happens
* scheduler gets allocation candidates in the following order (host1, host2)
* conductor tries to migrate the instance to host1 but that fails on host1 compute.manager._prep_resize with UnableToMigrateToSelf as the virt driver has no capability [1]
* UnableToMigrateToSelf is handled in the exception block in prep_resize and calls _reschedule_resize_or_reraise [2]
* that does the reschedule and conductor now selects host2 and do the allocation successfully so _reschedule returns True
* this means that nova ends up sending a resize.error (about the failure to migrate to host1) at [3]
* and this leads to 'inspect.trace()[-1]' call in [4] that fails for the bug author. But does not fail for me. inspect.trace() should return a non empty list [5][6] if called from a exception handling context. We are in an except block as we are executing [2]. It is also proven by the fact that the sys.exc_info() return a non (None, None, None) result at [7] that is printed at [8] and visible both in the bug reporters and in my logs.

So I'm clueless what happens.

@Vladislav: Could you provide all three compute logs and the conductor log? Could you please leave a bit more context in the logs before the first ERROR line?

@Vladislav: What is your exact environment? Which version of Queens? Do you have any custom nova code modification top of the upstream Queens version?

[1] https://github.com/openstack/nova/blob/307382f58d38778b480d2d030e427759a44c204b/nova/compute/manager.py#L4085
[2] https://github.com/openstack/nova/blob/307382f58d38778b480d2d030e427759a44c204b/nova/compute/manager.py#L4162
[3] https://github.com/openstack/nova/blob/307382f58d38778b480d2d030e427759a44c204b/nova/compute/manager.py#L4221
[4] https://github.com/openstack/nova/blob/307382f58d38778b480d2d030e427759a44c204b/nova/notifications/objects/exception.py#L42
[5] https://docs.python.org/2.7/library/inspect.html#inspect.trace
[6] https://docs.python.org/3.5/library/inspect.html#inspect.trace
[7] https://github.com/openstack/nova/blob/307382f58d38778b480d2d030e427759a44c204b/nova/compute/manager.py#L4159
[8] https://github.com/openstack/nova/blob/307382f58d38778b480d2d030e427759a44c204b/nova/compute/manager.py#L1313