Activity log for bug #1864665

Date Who What changed Old value New value Message
2020-02-25 15:51:48 Balazs Gibizer bug added bug
2020-02-25 15:52:00 Balazs Gibizer tags stable-only
2020-02-25 15:52:10 Balazs Gibizer nova: assignee Balazs Gibizer (balazs-gibizer)
2020-02-25 15:52:16 Balazs Gibizer nova: status New Triaged
2020-02-25 15:52:22 Balazs Gibizer nova: importance Undecided Medium
2020-02-25 15:54:16 Balazs Gibizer nominated for series nova/pike
2020-02-25 15:54:16 Balazs Gibizer bug task added nova/pike
2020-02-25 15:54:16 Balazs Gibizer nominated for series nova/rocky
2020-02-25 15:54:16 Balazs Gibizer bug task added nova/rocky
2020-02-25 15:54:16 Balazs Gibizer nominated for series nova/queens
2020-02-25 15:54:16 Balazs Gibizer bug task added nova/queens
2020-02-25 15:54:28 Balazs Gibizer nominated for series nova/ocata
2020-02-25 15:54:28 Balazs Gibizer bug task added nova/ocata
2020-02-25 15:54:41 Balazs Gibizer nova: status Triaged Invalid
2020-02-25 15:54:49 Balazs Gibizer nova/pike: status New Triaged
2020-02-25 15:54:53 Balazs Gibizer nova/pike: importance Undecided Medium
2020-02-25 15:55:02 Balazs Gibizer nova/pike: assignee Balazs Gibizer (balazs-gibizer)
2020-02-25 15:55:24 Balazs Gibizer description Description =========== Server cold migration fails after re-schedule. Steps to reproduce ================== * create a devstack with two compute hosts with libvirt driver * set allow_resize_to_same_host=True on both computes * set up cellsv2 without cell conductor and rabbit separation to allow re-schedule logic to call back to the super conductor / scheduler * enable NUMATopologyFilter and make sure both computes has NUMA resources * create a flavor with hw:cpu_policy='dedicated' extra spec * boot a server with the flavor and ensure that the server. Check which compute the server is placed (let's call it host1) * boot enough servers on host2 so that the next scheduling request could still be fulfilled by both computes but host1 will be preferred by the weighers * cold migrate the pinned server Expected result =============== * scheduler selects host1 first but that host fails with UnableToMigrateToSelf exception as libvirt does not have the capability * re-schedule happens * scheduler selects host2 where the server spawns successfully Actual result ============= * during the re-schedule when the conductor sends prep_resize RPC to host2 the json serialization of the request spec fails with Circural reference error. Environment =========== * two node devstack with libvirt driver * stable/pike nova. But expected to be reproduced in newer branches but not since stein. See triage part Triage ====== The json serialization blows up in the migrate conductor task. [1] After debugging I see that the infinit loop happens when jsonutils.to_primitive tries to serialize a VirtCPUTopology instance. The problematic piece of code has been removed by I4244f7dd8fe74565180f73684678027067b4506e in Stein. [1] https://github.com/openstack/nova/blob/4224a61b4f3a8b910dcaa498f9663479d61a6060/nova/conductor/tasks/migrate.py#L87 Description =========== Server cold migration fails after re-schedule. Steps to reproduce ================== * create a devstack with two compute hosts with libvirt driver * set allow_resize_to_same_host=True on both computes * set up cellsv2 without cell conductor and rabbit separation to allow re-schedule logic to call back to the super conductor / scheduler * enable NUMATopologyFilter and make sure both computes has NUMA resources * create a flavor with hw:cpu_policy='dedicated' extra spec * boot a server with the flavor. Check which compute the server is placed (let's call it host1) * boot enough servers on host2 so that the next scheduling request could still be fulfilled by both computes but host1 will be preferred by the weighers * cold migrate the pinned server Expected result =============== * scheduler selects host1 first but that host fails with UnableToMigrateToSelf exception as libvirt does not have the capability * re-schedule happens * scheduler selects host2 where the server spawns successfully Actual result ============= * during the re-schedule when the conductor sends prep_resize RPC to host2 the json serialization of the request spec fails with Circural reference error. Environment =========== * two node devstack with libvirt driver * stable/pike nova. But expected to be reproduced in newer branches but not since stein. See triage part Triage ====== The json serialization blows up in the migrate conductor task. [1] After debugging I see that the infinit loop happens when jsonutils.to_primitive tries to serialize a VirtCPUTopology instance. The problematic piece of code has been removed by I4244f7dd8fe74565180f73684678027067b4506e in Stein. [1] https://github.com/openstack/nova/blob/4224a61b4f3a8b910dcaa498f9663479d61a6060/nova/conductor/tasks/migrate.py#L87
2020-02-25 17:28:42 OpenStack Infra nova/rocky: status New In Progress
2020-02-25 17:28:42 OpenStack Infra nova/rocky: assignee Balazs Gibizer (balazs-gibizer)
2020-02-25 17:29:00 Balazs Gibizer nova/rocky: importance Undecided Medium
2020-03-14 19:01:19 OpenStack Infra nova/rocky: status In Progress Fix Committed
2020-03-15 11:45:14 OpenStack Infra nova/queens: status New In Progress
2020-03-15 11:45:14 OpenStack Infra nova/queens: assignee s10 (vlad-esten)
2020-03-16 07:42:40 Alexander Rubtsov bug added subscriber Alexander Rubtsov
2020-03-19 21:09:36 OpenStack Infra nova/queens: status In Progress Fix Committed
2020-03-20 15:33:16 OpenStack Infra nova/pike: status Triaged In Progress
2020-03-25 06:40:44 OpenStack Infra nova/pike: assignee Balazs Gibizer (balazs-gibizer) Elod Illes (elod-illes)
2020-03-25 11:40:31 OpenStack Infra nova/pike: status In Progress Fix Committed
2022-08-01 11:04:42 OpenStack Infra nova/pike: status Fix Committed Fix Released
2022-11-11 18:11:58 OpenStack Infra nova/queens: status Fix Committed Fix Released
2022-11-11 18:20:35 OpenStack Infra nova/rocky: status Fix Committed Fix Released