InstanceNotFound prevents putting over-quota instance into ERROR state

Bug #1717000 reported by Matt Riedemann
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Pike
Fix Committed
High
Matt Riedemann

Bug Description

I found this when trying to recreate bug 1716706.

https://bugs.launchpad.net/nova/+bug/1716706/comments/4

Basically I can get conductor to fail the quota recheck and go to set the instance into ERROR state but it fails to find the instance since we don't have the cell context targeted:

Sep 13 17:58:26 devstack-queens nova-conductor[3129]: WARNING nova.scheduler.utils [None req-90a115b2-5838-4be2-afe2-a3b755015e19 demo demo] [instance: 888925b0-164a-4d4a-bb6c-c0426f904e95] Setting instance to ERROR state.: TooManyInstances: Quota exceeded for instances: Requested 1, but already used 10 of 10 instances
Sep 13 17:58:26 devstack-queens nova-conductor[3129]: ERROR root [None req-90a115b2-5838-4be2-afe2-a3b755015e19 demo demo] Original exception being dropped: ['Traceback (most recent call last):\n', ' File "/opt/stack/nova/nova/conductor/manager.py", line 1003, in schedule_and_build_instances\n orig_num_req=len(build_requests))\n', ' File "/opt/stack/nova/nova/compute/utils.py", line 764, in check_num_instances_quota\n allowed=total_alloweds)\n', 'TooManyInstances: Quota exceeded for instances: Requested 1, but already used 10 of 10 instances\n']: InstanceNotFound: Instance 888925b0-164a-4d4a-bb6c-c0426f904e95 could not be found.
Sep 13 17:58:26 devstack-queens nova-conductor[3129]: ERROR oslo_messaging.rpc.server [None req-90a115b2-5838-4be2-afe2-a3b755015e19 demo demo] Exception during message handling: InstanceNotFound: Instance 888925b0-164a-4d4a-bb6c-c0426f904e95 could not be found.
Sep 13 17:58:26 devstack-queens nova-conductor[3129]: ERROR oslo_messaging.rpc.server InstanceNotFound: Instance 888925b0-164a-4d4a-bb6c-c0426f904e95 could not be found.

Because we don't target the cell when updating the instance.

https://github.com/openstack/nova/blob/cfdec41eeec5fab220702efefdaafc45559aeb14/nova/conductor/manager.py#L1168

Tags: quotas
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/503839

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: New → In Progress
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/504178

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/503839
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7e02f02d1501925ddeb15266c05d4d95f852e21a
Submitter: Jenkins
Branch: master

commit 7e02f02d1501925ddeb15266c05d4d95f852e21a
Author: Matt Riedemann <email address hidden>
Date: Wed Sep 13 17:30:59 2017 -0400

    Target context when setting instance to ERROR when over quota

    When conductor does the quota recheck, the instances are created
    in a cell but when we update the instance and set it to ERROR state,
    we were not targeting the context to the cell that the instance lives
    in, which leads to an InstanceNotFound error and then the instance
    is stuck in BUILD/scheduling state.

    This targets the context to the cell when updating the instance.

    Change-Id: I45faffaba4d329433a33cfb5e64c89ce4885df46
    Closes-Bug: #1717000

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/504178
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9cddde1f9775847d4b6671595dfc5c4b1bc8e718
Submitter: Jenkins
Branch: stable/pike

commit 9cddde1f9775847d4b6671595dfc5c4b1bc8e718
Author: Matt Riedemann <email address hidden>
Date: Wed Sep 13 17:30:59 2017 -0400

    Target context when setting instance to ERROR when over quota

    When conductor does the quota recheck, the instances are created
    in a cell but when we update the instance and set it to ERROR state,
    we were not targeting the context to the cell that the instance lives
    in, which leads to an InstanceNotFound error and then the instance
    is stuck in BUILD/scheduling state.

    This targets the context to the cell when updating the instance.

    Change-Id: I45faffaba4d329433a33cfb5e64c89ce4885df46
    Closes-Bug: #1717000
    (cherry picked from commit 7e02f02d1501925ddeb15266c05d4d95f852e21a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.1

This issue was fixed in the openstack/nova 16.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 17.0.0.0b1

This issue was fixed in the openstack/nova 17.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.