rebuild to same host with a different image results in erroneously doing a Claim

Bug #1750618 reported by Chris Friesen
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Ocata
Fix Committed
High
Tony Breeds
Pike
Fix Committed
High
Matt Riedemann
Queens
Fix Released
High
Matt Riedemann

Bug Description

As of stable/pike if we do a rebuild-to-same-node with a new image, it results in ComputeManager.rebuild_instance() being called with "scheduled_node=<hostname>" and "recreate=False". This results in a new Claim, which seems wrong since we're not changing the flavor and that claim could fail if the compute node is already full.

The comments in ComputeManager.rebuild_instance() make it appear that it expects both "recreate" and "scheduled_node" to be None for the rebuild-to-same-host case otherwise it will do a Claim. However, if we rebuild to a different image it ends up going through the scheduler which means that "scheduled_node" is not None.

Matt Riedemann (mriedem)
tags: added: rebuild
Revision history for this message
Matt Riedemann (mriedem) wrote :

This is a regression introduced with change I11746d1ea996a0f18b7c54b4c9c21df58cc4714b which was backported all the way to stable/newton upstream:

https://review.openstack.org/#/q/I11746d1ea996a0f18b7c54b4c9c21df58cc4714b

Changed in nova:
importance: Undecided → High
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/546268

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Triaged → In Progress
Revision history for this message
Tristan Cacqueray (tristan-cacqueray) wrote :

Is this extra Claim only affect the rebuild action, or can it leaks and affect the project quota or the compute capacity after the rebuild is completed?

Revision history for this message
Chris Friesen (cbf123) wrote :

It looks like it's mostly only affecting the rebuild action. In the compute_nodes table in the nova DB I'm seeing "memory_mb_used" be 1024 when it should be 512, but the CPU/disk usage is where it should be so I'm not sure what's going on.

Revision history for this message
Chris Friesen (cbf123) wrote :

Actually, it's showing as consuming 512MB under "memory_mb_used" even when the instance is gone, so I think this might be intentional.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/550545

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/550555

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/550560

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/546268
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a39029076c7997236a7f999682fb1e998c474204
Submitter: Zuul
Branch: master

commit a39029076c7997236a7f999682fb1e998c474204
Author: Matt Riedemann <email address hidden>
Date: Tue Feb 20 13:48:12 2018 -0500

    Only attempt a rebuild claim for an evacuation to a new host

    Change I11746d1ea996a0f18b7c54b4c9c21df58cc4714b changed the
    behavior of the API and conductor when rebuilding an instance
    with a new image such that the image is run through the scheduler
    filters again to see if it will work on the existing host that
    the instance is running on.

    As a result, conductor started passing 'scheduled_node' to the
    compute which was using it for logic to tell if a claim should be
    attempted. We don't need to do a claim for a rebuild since we're
    on the same host.

    This removes the scheduled_node logic from the claim code, as we
    should only ever attempt a claim if we're evacuating, which we
    can determine based on the 'recreate' parameter.

    Change-Id: I7fde8ce9dea16679e76b0cb2db1427aeeec0c222
    Closes-Bug: #1750618

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/550545
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3c5e519a8875a9766a0d3a06cb76cceba26634e6
Submitter: Zuul
Branch: stable/queens

commit 3c5e519a8875a9766a0d3a06cb76cceba26634e6
Author: Matt Riedemann <email address hidden>
Date: Tue Feb 20 13:48:12 2018 -0500

    Only attempt a rebuild claim for an evacuation to a new host

    Change I11746d1ea996a0f18b7c54b4c9c21df58cc4714b changed the
    behavior of the API and conductor when rebuilding an instance
    with a new image such that the image is run through the scheduler
    filters again to see if it will work on the existing host that
    the instance is running on.

    As a result, conductor started passing 'scheduled_node' to the
    compute which was using it for logic to tell if a claim should be
    attempted. We don't need to do a claim for a rebuild since we're
    on the same host.

    This removes the scheduled_node logic from the claim code, as we
    should only ever attempt a claim if we're evacuating, which we
    can determine based on the 'recreate' parameter.

    Change-Id: I7fde8ce9dea16679e76b0cb2db1427aeeec0c222
    Closes-Bug: #1750618
    (cherry picked from commit a39029076c7997236a7f999682fb1e998c474204)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/550555
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9890f3f69622489a0fd57cb1df354d7aa60161e0
Submitter: Zuul
Branch: stable/pike

commit 9890f3f69622489a0fd57cb1df354d7aa60161e0
Author: Matt Riedemann <email address hidden>
Date: Tue Feb 20 13:48:12 2018 -0500

    Only attempt a rebuild claim for an evacuation to a new host

    Change I11746d1ea996a0f18b7c54b4c9c21df58cc4714b changed the
    behavior of the API and conductor when rebuilding an instance
    with a new image such that the image is run through the scheduler
    filters again to see if it will work on the existing host that
    the instance is running on.

    As a result, conductor started passing 'scheduled_node' to the
    compute which was using it for logic to tell if a claim should be
    attempted. We don't need to do a claim for a rebuild since we're
    on the same host.

    This removes the scheduled_node logic from the claim code, as we
    should only ever attempt a claim if we're evacuating, which we
    can determine based on the 'recreate' parameter.

    Conflicts:
          nova/compute/manager.py

    NOTE(mriedem): The conflict is due to change
    I0883c2ba1989c5d5a46e23bcbcda53598707bcbc in Queens.

    Change-Id: I7fde8ce9dea16679e76b0cb2db1427aeeec0c222
    Closes-Bug: #1750618
    (cherry picked from commit a39029076c7997236a7f999682fb1e998c474204)
    (cherry picked from commit 3c5e519a8875a9766a0d3a06cb76cceba26634e6)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.2

This issue was fixed in the openstack/nova 17.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.1.1

This issue was fixed in the openstack/nova 16.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/550560
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=286dd2c23c7c361c6538be65941e3c19e83e6d52
Submitter: Zuul
Branch: stable/ocata

commit 286dd2c23c7c361c6538be65941e3c19e83e6d52
Author: Matt Riedemann <email address hidden>
Date: Tue Feb 20 13:48:12 2018 -0500

    Only attempt a rebuild claim for an evacuation to a new host

    Change I11746d1ea996a0f18b7c54b4c9c21df58cc4714b changed the
    behavior of the API and conductor when rebuilding an instance
    with a new image such that the image is run through the scheduler
    filters again to see if it will work on the existing host that
    the instance is running on.

    As a result, conductor started passing 'scheduled_node' to the
    compute which was using it for logic to tell if a claim should be
    attempted. We don't need to do a claim for a rebuild since we're
    on the same host.

    This removes the scheduled_node logic from the claim code, as we
    should only ever attempt a claim if we're evacuating, which we
    can determine based on the 'recreate' parameter.

    Conflicts:
          nova/tests/functional/test_servers.py

    NOTE(mriedem): test_rebuild_with_new_image does not exist in
    Ocata and does not apply to Ocata since it is primarily
    testing allocations getting created in Placement via the
    FilterScheduler, which was new in Pike. As a result the change
    to that test is not part of this backport, but a similar assertion
    is added to an existing rebuild unit test.

    Change-Id: I7fde8ce9dea16679e76b0cb2db1427aeeec0c222
    Closes-Bug: #1750618
    (cherry picked from commit a39029076c7997236a7f999682fb1e998c474204)
    (cherry picked from commit 3c5e519a8875a9766a0d3a06cb76cceba26634e6)
    (cherry picked from commit 9890f3f69622489a0fd57cb1df354d7aa60161e0)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 18.0.0.0b1

This issue was fixed in the openstack/nova 18.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.1.1

This issue was fixed in the openstack/nova 15.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.