Circular reference error during re-schedule

Bug #1864665 reported by Balazs Gibizer
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Medium
Balazs Gibizer
Ocata
New
Undecided
Unassigned
Pike
Fix Released
Medium
Elod Illes
Queens
Fix Released
Undecided
s10
Rocky
Fix Released
Medium
Balazs Gibizer

Bug Description

Description
===========
Server cold migration fails after re-schedule.

Steps to reproduce
==================
* create a devstack with two compute hosts with libvirt driver
* set allow_resize_to_same_host=True on both computes
* set up cellsv2 without cell conductor and rabbit separation to allow re-schedule logic to call back to the super conductor / scheduler
* enable NUMATopologyFilter and make sure both computes has NUMA resources
* create a flavor with hw:cpu_policy='dedicated' extra spec
* boot a server with the flavor. Check which compute the server is placed (let's call it host1)
* boot enough servers on host2 so that the next scheduling request could still be fulfilled by both computes but host1 will be preferred by the weighers
* cold migrate the pinned server

Expected result
===============
* scheduler selects host1 first but that host fails with UnableToMigrateToSelf exception as libvirt does not have the capability
* re-schedule happens
* scheduler selects host2 where the server spawns successfully

Actual result
=============
* during the re-schedule when the conductor sends prep_resize RPC to host2 the json serialization of the request spec fails with Circural reference error.

Environment
===========
* two node devstack with libvirt driver
* stable/pike nova. But expected to be reproduced in newer branches but not since stein. See triage part

Triage
======
The json serialization blows up in the migrate conductor task. [1] After debugging I see that the infinit loop happens when jsonutils.to_primitive tries to serialize a VirtCPUTopology instance.

The problematic piece of code has been removed by I4244f7dd8fe74565180f73684678027067b4506e in Stein.

[1] https://github.com/openstack/nova/blob/4224a61b4f3a8b910dcaa498f9663479d61a6060/nova/conductor/tasks/migrate.py#L87

Tags: stable-only
tags: added: stable-only
Changed in nova:
assignee: nobody → Balazs Gibizer (balazs-gibizer)
status: New → Triaged
importance: Undecided → Medium
status: Triaged → Invalid
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/709798

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.opendev.org/709798
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3871b38fe03aee7a1ffbbdfdf8a60b8c09e0ba76
Submitter: Zuul
Branch: stable/rocky

commit 3871b38fe03aee7a1ffbbdfdf8a60b8c09e0ba76
Author: Balazs Gibizer <email address hidden>
Date: Tue Feb 25 16:48:48 2020 +0100

    Avoid circular reference during serialization

    When an instance with numa topology is re-scheduled the conductor
    migrate task blows with circular reference during request spec
    serialization. It happens because there are ovos in the request spec
    that jsonutils.dumps only serialize if requested explicitly.

    This patch makes the explicit request.

    This is a stable only bug fix as the borken code was removed in Stein by
    the feature patch I4244f7dd8fe74565180f73684678027067b4506e

    Closes-Bug: #1864665

    Change-Id: I1942bfa9bd1baf8738d34c287216db7b59000a36

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/713132

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.opendev.org/713132
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=54ca5d9afb11867ea022464d7ecad9f1ce13e453
Submitter: Zuul
Branch: stable/queens

commit 54ca5d9afb11867ea022464d7ecad9f1ce13e453
Author: Balazs Gibizer <email address hidden>
Date: Tue Feb 25 16:48:48 2020 +0100

    Avoid circular reference during serialization

    When an instance with numa topology is re-scheduled the conductor
    migrate task blows with circular reference during request spec
    serialization. It happens because there are ovos in the request spec
    that jsonutils.dumps only serialize if requested explicitly.

    This patch makes the explicit request.

    This is a stable only bug fix as the borken code was removed in Stein by
    the feature patch I4244f7dd8fe74565180f73684678027067b4506e

    Closes-Bug: #1864665

    Change-Id: I1942bfa9bd1baf8738d34c287216db7b59000a36
    (cherry picked from commit 3871b38fe03aee7a1ffbbdfdf8a60b8c09e0ba76)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.opendev.org/714148

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.opendev.org/714148
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f5091a91d0a977dd43b939317ecdeb1cd5db1980
Submitter: Zuul
Branch: stable/pike

commit f5091a91d0a977dd43b939317ecdeb1cd5db1980
Author: Balazs Gibizer <email address hidden>
Date: Tue Feb 25 16:48:48 2020 +0100

    Avoid circular reference during serialization

    When an instance with numa topology is re-scheduled the conductor
    migrate task blows with circular reference during request spec
    serialization. It happens because there are ovos in the request spec
    that jsonutils.dumps only serialize if requested explicitly.

    This patch makes the explicit request.

    This is a stable only bug fix as the borken code was removed in Stein by
    the feature patch I4244f7dd8fe74565180f73684678027067b4506e

    Conflicts:
          nova/tests/unit/conductor/tasks/test_migrate.py
    The unit test case was re-implemented the test refactoring in
    I57568e9a01664ee373ea00a8db3164109c982909 is missing from pike.

    Closes-Bug: #1864665

    Change-Id: I1942bfa9bd1baf8738d34c287216db7b59000a36
    (cherry picked from commit 3871b38fe03aee7a1ffbbdfdf8a60b8c09e0ba76)
    (cherry picked from commit 54ca5d9afb11867ea022464d7ecad9f1ce13e453)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova pike-eol

This issue was fixed in the openstack/nova pike-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova queens-eol

This issue was fixed in the openstack/nova queens-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova rocky-eol

This issue was fixed in the openstack/nova rocky-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.