Evacuations are not restricted to the source cell during scheduling

Bug #1823370 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann
Pike
Confirmed
Medium
Unassigned
Queens
Confirmed
Medium
Unassigned
Rocky
Confirmed
Medium
Unassigned
Stein
Confirmed
Medium
Unassigned

Bug Description

During most move operations we restrict the request spec to the cell the instance is in before calling the scheduler:

unshelve: https://github.com/openstack/nova/blob/a6963fa6858289d048e4d27ce8e61637cd023f4c/nova/conductor/manager.py#L822

cold migrate: https://github.com/openstack/nova/blob/a6963fa6858289d048e4d27ce8e61637cd023f4c/nova/conductor/tasks/migrate.py#L163

live migrate: https://github.com/openstack/nova/blob/a6963fa6858289d048e4d27ce8e61637cd023f4c/nova/conductor/tasks/live_migrate.py#L354

But for some reason we don't do that during evacuate (or rebuild to the same host with forced hosts/nodes when the image changes - which in that rebuild case means the scheduler is getting nodes from all cells just to find the one we are forcing):

https://github.com/openstack/nova/blob/a6963fa6858289d048e4d27ce8e61637cd023f4c/nova/conductor/manager.py#L1011

I'm not sure how this would fail, but if the scheduler did pick a host in another cell things would surely fail because evacuate won't work across cells (the instance data is in the source cell db).

Tags: cells evacuate
Revision history for this message
Matt Riedemann (mriedem) wrote :

I likely need to write a functional test to recreate this first to see how things fail.

Matt Riedemann (mriedem)
Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/650424

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/650429

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/650424
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a7bd9688d533aaa622d6049a99d9339d1fbfa88f
Submitter: Zuul
Branch: master

commit a7bd9688d533aaa622d6049a99d9339d1fbfa88f
Author: Matt Riedemann <email address hidden>
Date: Fri Apr 5 15:16:41 2019 -0400

    Add functional recreate test for bug 1823370

    When evacuating a server in a multi-cell environment
    we should be restricting the scheduling request during
    evacuate to the cell in which the instance already exists
    since we don't support cross-cell evacuate.

    This adds a functional test to recreate the bug to show
    that the scheduler is not restricted to the instance's
    current cell when evacuating.

    Change-Id: I56e20c84f25cc4961dc8d637c222b6f213c4d5f9
    Related-Bug: #1823370

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/650429
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=95df2a239c32f2ee5d00f06a59a9e91b59f3aca5
Submitter: Zuul
Branch: master

commit 95df2a239c32f2ee5d00f06a59a9e91b59f3aca5
Author: Matt Riedemann <email address hidden>
Date: Fri Apr 5 15:36:00 2019 -0400

    Restrict RequestSpec to cell when evacuating

    When evacuating a server in a multi-cell environment
    we need to restrict the scheduling request during
    evacuate to the cell in which the instance already exists
    since we don't support cross-cell evacuate.

    This fixes the issue by restricting the RequestSpec to
    the instance's current cell during evacuate in the same
    way we do during unshelve.

    Note that this should also improve performance when
    rebuilding a server with a new image since we will only
    look for the ComputeNode from the targeted cell rather
    than iterate all enabled cells during scheduling.

    Change-Id: I497180fb81fd966d1d3d4b54ac66d2609347583e
    Closes-Bug: #1823370

Changed in nova:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.