OpenStack Compute (nova)

Bug #1864665
Comment #0

Comment 0 for bug 1864665

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2020-02-25:

Description
===========
Server cold migration fails after re-schedule.

Steps to reproduce
==================
* create a devstack with two compute hosts with libvirt driver
* set allow_resize_to_same_host=True on both computes
* set up cellsv2 without cell conductor and rabbit separation to allow re-schedule logic to call back to the super conductor / scheduler
* enable NUMATopologyFilter and make sure both computes has NUMA resources
* create a flavor with hw:cpu_policy='dedicated' extra spec
* boot a server with the flavor and ensure that the server. Check which compute the server is placed (let's call it host1)
* boot enough servers on host2 so that the next scheduling request could still be fulfilled by both computes but host1 will be preferred by the weighers
* cold migrate the pinned server

Expected result
===============
* scheduler selects host1 first but that host fails with UnableToMigrateToSelf exception as libvirt does not have the capability
* re-schedule happens
* scheduler selects host2 where the server spawns successfully

Actual result
=============
* during the re-schedule when the conductor sends prep_resize RPC to host2 the json serialization of the request spec fails with Circural reference error.

Environment
===========
* two node devstack with libvirt driver
* stable/pike nova. But expected to be reproduced in newer branches but not since stein. See triage part

Triage
======
The json serialization blows up in the migrate conductor task. [1] After debugging I see that the infinit loop happens when jsonutils.to_primitive tries to serialize a VirtCPUTopology instance.

The problematic piece of code has been removed by I4244f7dd8fe74565180f73684678027067b4506e in Stein.

[1] https://github.com/openstack/nova/blob/4224a61b4f3a8b910dcaa498f9663479d61a6060/nova/conductor/tasks/migrate.py#L87