Instances failing quota recheck end up with no assigned cell

Bug #1715462 reported by Mohammed Naser
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Mohammed Naser
Pike
Fix Committed
High
Matt Riedemann

Bug Description

When an instance fails the quota rechecks codebase which is here:

https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L992-L1006

It raises an exception, however, the cell mapping is only saved much later (thanks help of dansmith for finding this):

https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L1037-L1043

This results in an instance with an unassigned cell, where it should technically be the cell it was scheduled into.

Tags: cells quotas
Matt Riedemann (mriedem)
tags: added: cells quotas
melanie witt (melwitt)
Changed in nova:
importance: Undecided → High
Sean Dague (sdague)
Changed in nova:
status: New → Confirmed
Revision history for this message
Mohammed Naser (mnaser) wrote :

Just to update, this only affects Pike and newer due to the fact that the codebase for quota recheck not being present before. I'll be publishing a fix shortly (unit tests okay, testing functional locally before pushing up).

Changed in nova:
assignee: nobody → Mohammed Naser (mnaser)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/501408

Changed in nova:
status: Confirmed → In Progress
Changed in nova:
assignee: Mohammed Naser (mnaser) → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/501408
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bf0f5d475458a67a12000ff49a8c5285c3ac9e45
Submitter: Jenkins
Branch: master

commit bf0f5d475458a67a12000ff49a8c5285c3ac9e45
Author: Mohammed Naser <email address hidden>
Date: Wed Sep 6 15:19:01 2017 -0400

    Ensure instance mapping is updated in case of quota recheck fails

    If an instance fails to successfully pass the quota recheck, it will
    raise a TooManyInstances exception, however, it will not hit the
    code which saves the instance mapping, leaving an instance with no
    assigned cell in the mapping table and no BuildRequest as it is
    removed by _cleanup_build_artifacts.

    This patch adds a test to make sure that an instance has the correct
    cell mapping if it fails in the quota recheck phase. In addition, it
    uses the cell_mapping_cache dictionary to set the correct cell
    mapping before marking the instance as ERROR.

    Co-Authored-By: Dan Smith <email address hidden>
    Co-Authored-By: Matt Riedemann <email address hidden>
    Co-Authored-By: melanie witt <email address hidden>
    Closes-Bug: #1715462

    Change-Id: I7ecb5feb47a5f358cd51bde87b75a3a6141b5b12

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/501821

Matt Riedemann (mriedem)
Changed in nova:
assignee: Matt Riedemann (mriedem) → Mohammed Naser (mnaser)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/501821
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a98a52d85eb2c695568ac01d5ae7baf1bc20d9e3
Submitter: Jenkins
Branch: stable/pike

commit a98a52d85eb2c695568ac01d5ae7baf1bc20d9e3
Author: Mohammed Naser <email address hidden>
Date: Wed Sep 6 15:19:01 2017 -0400

    Ensure instance mapping is updated in case of quota recheck fails

    If an instance fails to successfully pass the quota recheck, it will
    raise a TooManyInstances exception, however, it will not hit the
    code which saves the instance mapping, leaving an instance with no
    assigned cell in the mapping table and no BuildRequest as it is
    removed by _cleanup_build_artifacts.

    This patch adds a test to make sure that an instance has the correct
    cell mapping if it fails in the quota recheck phase. In addition, it
    uses the cell_mapping_cache dictionary to set the correct cell
    mapping before marking the instance as ERROR.

    Co-Authored-By: Dan Smith <email address hidden>
    Co-Authored-By: Matt Riedemann <email address hidden>
    Co-Authored-By: melanie witt <email address hidden>
    Closes-Bug: #1715462

    Change-Id: I7ecb5feb47a5f358cd51bde87b75a3a6141b5b12
    (cherry picked from commit bf0f5d475458a67a12000ff49a8c5285c3ac9e45)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.1

This issue was fixed in the openstack/nova 16.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b1

This issue was fixed in the openstack/nova 17.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.