NoValidHost during live migration after cold migrating to a specified host

Bug #1797580 reported by Matt Riedemann
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Queens
Fix Committed
High
Matt Riedemann
Rocky
Fix Committed
High
Matt Riedemann

Bug Description

I recreated this with a 2-node devstack in stein created yesterday.

1. create a server
2. cold migrate the server to the other host and specify the host: nova migrate <server> --host <other host>
3. confirm the resize
4. live migrate the server w/o specifying a host so the scheduler has to pick one

At this point, you get a NoValidHost error because the scheduler is restricted to the current host on which the instance is running because of the requested_destination field that is persisted in the request spec from step 2:

http://paste.openstack.org/show/731972/

The problem is when cold migrating a server with a specified target host, compute API stores that on the request spec and sends it to the conductor to tell the scheduler which host to use:

https://github.com/openstack/nova/blob/20bc0136d0665bafdcd379f19389a0a5ea7bf310/nova/compute/api.py#L3565

But that request spec requested_destination field gets persisted and then when you live migrate, it's re-used but since the server is already on that host, we get NoValidHost since you can't live migrate to the same host.

This is a regression in Queens: https://review.openstack.org/#/c/408955/

Revision history for this message
Matt Riedemann (mriedem) wrote :

By the way, this is where the changes to the request_spec get persisted during the cold migrate:

https://github.com/openstack/nova/blob/20bc0136d0665bafdcd379f19389a0a5ea7bf310/nova/conductor/manager.py#L353

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/610088

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/610098

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/611938

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/611939

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/610088
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bfc8d1052ba6f1011fcdb882a825694acf98dd39
Submitter: Zuul
Branch: master

commit bfc8d1052ba6f1011fcdb882a825694acf98dd39
Author: Matt Riedemann <email address hidden>
Date: Fri Oct 12 11:37:38 2018 -0400

    Add regression test for bug 1797580

    Microversion 2.56 allows cold migrating to a specified target host. The
    compute API sets the requested destination on the request spec with the
    specified target host and then conductor sends that request spec to the
    scheduler to validate the host. Conductor later persists the changes to
    the request spec because it's the resize flow and the flavor could change
    (even though in this case it won't since it's a cold migrate). After
    confirming the resize, if the server is live migrated it will fail during
    scheduling because of the persisted RequestSpec.requested_destination
    from the cold migration, and you can't live migrate to the same host on
    which the instance is currently running.

    This change adds a test to recreate the regression bug.

    Change-Id: I588655fdd90917d00ccf5eb0a8df7bccc1ac0e81
    Related-Bug: #1797580

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/611944

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/611945

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/610098
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ce3af5e33ae6843411e611e81c6ca1c21e0f1e09
Submitter: Zuul
Branch: master

commit ce3af5e33ae6843411e611e81c6ca1c21e0f1e09
Author: Matt Riedemann <email address hidden>
Date: Fri Oct 12 12:03:41 2018 -0400

    Don't persist RequestSpec.requested_destination

    The RequestSpec.requested_destination, similar to the
    retry field, is per-request/operation, and persisting
    it can caues issues when subsequent move requests.

    For example, if you cold migrate a server to a specific
    host and then live migrate that server without specifying
    a host, the requested target host from the cold migrate
    is sent to the scheduler for the live migration, but since
    that is where the instance is already running, it's
    rejected with NoValidHost.

    This is a similar issue to the need to call
    RequestSpec.reset_forced_destinations() in all move operations
    in conductor. However, rather than try to whack this mole in
    every place the request spec is sent to the scheduler, like
    reset_forced_destinations() is used, we simply don't need to
    persist the requested_destination field since it's just a
    vehicle to tell the scheduler which host we want.

    The related functional regression test is updated to show
    the bug is now fixed.

    Change-Id: I2a78f0754c63381c57e7e1c610d0938b6df0f537
    Closes-Bug: #1797580

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/611938
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cb9c96f0a8287a3261432c75f9178275780cec38
Submitter: Zuul
Branch: stable/rocky

commit cb9c96f0a8287a3261432c75f9178275780cec38
Author: Matt Riedemann <email address hidden>
Date: Fri Oct 12 11:37:38 2018 -0400

    Add regression test for bug 1797580

    Microversion 2.56 allows cold migrating to a specified target host. The
    compute API sets the requested destination on the request spec with the
    specified target host and then conductor sends that request spec to the
    scheduler to validate the host. Conductor later persists the changes to
    the request spec because it's the resize flow and the flavor could change
    (even though in this case it won't since it's a cold migrate). After
    confirming the resize, if the server is live migrated it will fail during
    scheduling because of the persisted RequestSpec.requested_destination
    from the cold migration, and you can't live migrate to the same host on
    which the instance is currently running.

    This change adds a test to recreate the regression bug.

    Change-Id: I588655fdd90917d00ccf5eb0a8df7bccc1ac0e81
    Related-Bug: #1797580
    (cherry picked from commit bfc8d1052ba6f1011fcdb882a825694acf98dd39)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/rocky)

Reviewed: https://review.openstack.org/611939
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=954b2004a1daa43d4dff0694e6e5cdb9630a441b
Submitter: Zuul
Branch: stable/rocky

commit 954b2004a1daa43d4dff0694e6e5cdb9630a441b
Author: Matt Riedemann <email address hidden>
Date: Fri Oct 12 12:03:41 2018 -0400

    Don't persist RequestSpec.requested_destination

    The RequestSpec.requested_destination, similar to the
    retry field, is per-request/operation, and persisting
    it can caues issues when subsequent move requests.

    For example, if you cold migrate a server to a specific
    host and then live migrate that server without specifying
    a host, the requested target host from the cold migrate
    is sent to the scheduler for the live migration, but since
    that is where the instance is already running, it's
    rejected with NoValidHost.

    This is a similar issue to the need to call
    RequestSpec.reset_forced_destinations() in all move operations
    in conductor. However, rather than try to whack this mole in
    every place the request spec is sent to the scheduler, like
    reset_forced_destinations() is used, we simply don't need to
    persist the requested_destination field since it's just a
    vehicle to tell the scheduler which host we want.

    The related functional regression test is updated to show
    the bug is now fixed.

    Change-Id: I2a78f0754c63381c57e7e1c610d0938b6df0f537
    Closes-Bug: #1797580
    (cherry picked from commit ce3af5e33ae6843411e611e81c6ca1c21e0f1e09)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/611944
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fbf729d61b608bf8b19157bf699f6c0b5f43f10d
Submitter: Zuul
Branch: stable/queens

commit fbf729d61b608bf8b19157bf699f6c0b5f43f10d
Author: Matt Riedemann <email address hidden>
Date: Fri Oct 12 11:37:38 2018 -0400

    Add regression test for bug 1797580

    Microversion 2.56 allows cold migrating to a specified target host. The
    compute API sets the requested destination on the request spec with the
    specified target host and then conductor sends that request spec to the
    scheduler to validate the host. Conductor later persists the changes to
    the request spec because it's the resize flow and the flavor could change
    (even though in this case it won't since it's a cold migrate). After
    confirming the resize, if the server is live migrated it will fail during
    scheduling because of the persisted RequestSpec.requested_destination
    from the cold migration, and you can't live migrate to the same host on
    which the instance is currently running.

    This change adds a test to recreate the regression bug.

    Change-Id: I588655fdd90917d00ccf5eb0a8df7bccc1ac0e81
    Related-Bug: #1797580
    (cherry picked from commit bfc8d1052ba6f1011fcdb882a825694acf98dd39)
    (cherry picked from commit cb9c96f0a8287a3261432c75f9178275780cec38)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.1.0

This issue was fixed in the openstack/nova 18.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/611945
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a29a3c9cb16ee0b42f125f8e9b047f22d05c47c4
Submitter: Zuul
Branch: stable/queens

commit a29a3c9cb16ee0b42f125f8e9b047f22d05c47c4
Author: Matt Riedemann <email address hidden>
Date: Fri Oct 12 12:03:41 2018 -0400

    Don't persist RequestSpec.requested_destination

    The RequestSpec.requested_destination, similar to the
    retry field, is per-request/operation, and persisting
    it can caues issues when subsequent move requests.

    For example, if you cold migrate a server to a specific
    host and then live migrate that server without specifying
    a host, the requested target host from the cold migrate
    is sent to the scheduler for the live migration, but since
    that is where the instance is already running, it's
    rejected with NoValidHost.

    This is a similar issue to the need to call
    RequestSpec.reset_forced_destinations() in all move operations
    in conductor. However, rather than try to whack this mole in
    every place the request spec is sent to the scheduler, like
    reset_forced_destinations() is used, we simply don't need to
    persist the requested_destination field since it's just a
    vehicle to tell the scheduler which host we want.

    The related functional regression test is updated to show
    the bug is now fixed.

    Conflicts:
          nova/objects/request_spec.py

    NOTE(mriedem): The conflict is due to not having change
    Icb295bbd8c83e2e340a7ac3ecc1f159e0db7c7b1 in Queens.

    Change-Id: I2a78f0754c63381c57e7e1c610d0938b6df0f537
    Closes-Bug: #1797580
    (cherry picked from commit ce3af5e33ae6843411e611e81c6ca1c21e0f1e09)
    (cherry picked from commit 954b2004a1daa43d4dff0694e6e5cdb9630a441b)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.10

This issue was fixed in the openstack/nova 17.0.10 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.