Folsom - Quota exceeded error messages for instance quota exceeded are not proper

Bug #1046236 reported by Rajalakshmi Ganesan
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Eoghan Glynn

Bug Description

Description:

The error message that appears when quota of instances allocated for a project is exceeded is not proper. THe is a mismatch between the number of instances that are shown in nova list and that the error message indicates.( This behaviour is intermitent. Not reproducible always)

Environment: Folsom single and multi node.

LOG:
---------

SINGLE NODE:

rajalakshmi_ganesan@ubuntu:~$ nova --debug list

REQ: curl -i http://15.184.83.251:8774/v2/637fefbe43a34998a1f583b1ce4b3bd7/servers/detail -X GET -H "X-Auth-Project-Id: nova_auto_project" -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: d495b30babc14f3792d118f768ae7cf0"

connect: (15.184.83.251, 8774)
send: 'GET /v2/637fefbe43a34998a1f583b1ce4b3bd7/servers/detail HTTP/1.1\r\nHost: 15.184.83.251:8774\r\nx-auth-project-id: nova_auto_project\r\nx-auth-token: d495b30babc14f3792d118f768ae7cf0\r\naccept-encoding: gzip, deflate\r\naccept: application/json\r\nuser-agent: python-novaclient\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: X-Compute-Request-Id: req-fb30fc19-b904-4431-9c74-841969c77cb2
header: Content-Type: application/json
header: Content-Length: 15
header: Date: Wed, 05 Sep 2012 09:38:03 GMT
RESP:{'status': '200', 'content-length': '15', 'content-location': 'http://15.184.83.251:8774/v2/637fefbe43a34998a1f583b1ce4b3bd7/servers/detail', 'x-compute-request-id': 'req-fb30fc19-b904-4431-9c74-841969c77cb2', 'date': 'Wed, 05 Sep 2012 09:38:03 GMT', 'content-type': 'application/json'} {"servers": []}

rajalakshmi_ganesan@ubuntu:~$ nova boot --image 49775afa-52f5-40e7-acf9-0ed8c64ca18b --flavor 1 test-1
ERROR: Quota exceeded for instances: Requested 1, but already used 12 of 10 instances (HTTP 413) (Request-ID: req-be253057-5e3f-4448-a78c-41116f8272ac)
rajalakshmi_ganesan@ubuntu:~$

THERE ARE NO SERVERS DISPLAYED IN NOVA LIST....STILL IT SAYS "Requested 1, but already used 12 of 10 instances "

MULTI NODE:
------------------

rajalakshmi_ganesan@ubuntu:~$ nova list
+--------------------------------------+---------------------------------------+--------+-------------------+
| ID | Name | Status | Networks |
+--------------------------------------+---------------------------------------+--------+-------------------+
| b81f4fd0-84b5-45ba-bf2d-14f26fa013a6 | auto_nonexist_server_metadata | ACTIVE | private=10.0.0.41 |
| c33913a0-841d-496a-85e4-40665ede44c8 | auto_nonexist_server_test_list_image | ACTIVE | private=10.0.0.39 |
| 737ef9b7-7b34-452f-a0ec-01c51e1529c6 | auto_server_image_create | ACTIVE | private=10.0.0.27 |
| 4b665253-6b4e-4785-b50a-5f6f7a98eb78 | auto_server_test_delete_image | ACTIVE | private=10.0.0.28 |
| 782f4c23-9fa7-462a-813f-5938509688e3 | auto_test_server_delete_metadata_item | ACTIVE | private=10.0.0.37 |
| 317d823c-5ab6-461d-be9e-d0ca9a151ef4 | auto_test_server_get_image_details | ACTIVE | private=10.0.0.29 |
| 57270ea4-bbf6-4fd7-88e5-4b659ca40b4a | auto_test_server_list_image | ACTIVE | private=10.0.0.30 |
| 6d08f8ba-7880-415d-8372-f03735b65179 | server_for_vol_attach | ACTIVE | private=10.0.0.22 |
| 379a0302-6a76-48c5-ba38-f4de8bb20557 | server_invalid_url | ACTIVE | private=10.0.0.33 |
| 4ef2c987-942e-4914-91bf-63970d6edf12 | server_invalid_url_xml_volume | ACTIVE | private=10.0.0.36 |
| 337780df-274d-482b-b7dd-b3d5db858fcd | server_metadata_invalid_url | ACTIVE | private=10.0.0.31 |
| 022a527b-3b81-4aef-9ad9-2d865de42249 | server_metadata_update_invalid_url | ACTIVE | private=10.0.0.32 |
| f6a422a6-0dbb-4b5e-b67d-375d755179be | server_metadata_update_xml | ACTIVE | private=10.0.0.35 |
| b538e207-fc49-4b6d-89e0-b0ea3582a767 | server_metadata_xml | ACTIVE | private=10.0.0.34 |
| 31e2b85e-e19e-43b8-9f00-e8a6ae1a912a | test_floatingip_ACTIVE | ACTIVE | private=10.0.0.24 |
| 7c6a1286-f203-4f67-9920-1d7c4e6e3b66 | test_floatingip_HARD_REBOOT | ACTIVE | private=10.0.0.26 |
| 294ebd3e-d3b6-4ff7-ad2a-ec43ff8d3de8 | test_floatingip_REBOOT | ACTIVE | private=10.0.0.25 |
| fc917e20-c532-45f3-a12f-78f8d1387c76 | test_get_console_output | ACTIVE | private=10.0.0.23 |
| dd40a5e4-ba11-4dae-bb1d-05f3b83ba2cc | test_list_server_metadata | ACTIVE | private=10.0.0.40 |
| 7e863497-d000-4d0c-bf0b-3374a628e81f | test_server_metadata | ACTIVE | private=10.0.0.38 |
| acecbb30-b322-4a76-99e5-14c1660bfe12 | test_set_server_metadata | ACTIVE | private=10.0.0.42 |
+--------------------------------------+---------------------------------------+--------+-------------------+
rajalakshmi_ganesan@ubuntu:~$ nova boot --image 170158af-9248-417a-8c26-865502cc05a6 --flavor 1 test-12
ERROR: Quota exceeded for instances: Requested 1, but already used 16 of 10 instances (HTTP 413) (Request-ID: req-f3517177-2812-4cda-82ef-6e7257a02b63)

THERE ARE 21 SERVERS DISPLAYED IN NOVA LIST....STILL IT SAYS "Requested 1, but already used 16 of 10 instances "

summary: - Folsom - Quota exceeded error messages for instance quota exceededare
+ Folsom - Quota exceeded error messages for instance quota exceeded are
not proper
Revision history for this message
Vish Ishaya (vishvananda) wrote :

Something does look incorrect here.

Changed in nova:
importance: Undecided → High
status: New → Triaged
milestone: none → folsom-rc1
Eoghan Glynn (eglynn)
Changed in nova:
assignee: nobody → Eoghan Glynn (eglynn)
Revision history for this message
Eoghan Glynn (eglynn) wrote :

Hi Rajalakshmi,

What exact version of Folsom are you using?

Do you have a current case where this issue is showing up?

If so, can you confirm the relevant data in the quota_usages table, e.g.:

  echo "select * from quota_usages where project_id = '637fefbe43a34998a1f583b1ce4b3bd7' and resource = 'instances'" | mysql -u root -p<password> nova

Also, can you confirm whether the instance quota threshold was changed previously? (e.g. increased to 20, then decreased back to 10)

Finally, other than booting instances, what other instance operations have been carried out before the issue manifests? (e.g. resizing, deleting etc.)

Thanks in advance for the information.

Revision history for this message
Eoghan Glynn (eglynn) wrote :

OK, here we have at least two separate unexpected states in the quota accounting:

(a) the usage count exceeding the quota threshold (e.g. 16 > 10 instances)

(b) the usage count not matching the actual resource consumption (e.g. 16 < 21 instances)

Until the bug reporter indicates otherwise, the obvious cause for (a) is that the quota threshold was reduced via an administrative action (this does not take into account current usage and allows negative headroom to result).

For the multi-node case at least, I suspect that the mismatch in (b) is related to bug #1046188.

If an exception is raised during the nova-compute node's instance termination logic *before* the instance is evicted from the DB (e.g. in the network teardown), then the task_state is reverted back to None by the reverts_task_state decorator.

Hence the instance continues to be reported as ACTIVE, and a subsequent attempt to delete the same instance results in the usage count being decremented a second time.

This occurs despite https://github.com/openstack/nova/commit/aaccb0a9 as that fix depends on the the VM task state continuing to be DELETING. That logic was undermined in https://github.com/openstack/nova/commit/d8d7100f which caused the task state to be reverted on the successful or unsuccessful conclusion of the action on the compute node.

Now, the motivation for https://github.com/openstack/nova/commit/d8d7100f was to avoid the VM getting stuck with a task state that prevented any further action other than deletion. However in the case of a deletion already being requested, this protection is unnecessary and counter-productive.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/12887

Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/12887
Committed: http://github.com/openstack/nova/commit/5f4d20ef439fe04e5d7ab7e1d4610b739222ce6b
Submitter: Jenkins
Branch: master

commit 5f4d20ef439fe04e5d7ab7e1d4610b739222ce6b
Author: Eoghan Glynn <email address hidden>
Date: Wed Sep 12 17:20:13 2012 +0100

    Avoid VM task state revert on instance termination

    Related to bug 1046236.

    Previously, if an exception is raised during the nova-compute node's
    instance termination logic *before* the instance is evicted from the DB
    (e.g. in the network teardown), then the task_state is reverted back to
    None by the reverts_task_state decorator, so the instance continues
    to be reported as ACTIVE with no outstanding task.

    A subsequent attempt to delete the same instance results in the
    quota_usages in_use count being decremented a second time.

    This occurs despite I91a70ada as that fix depends on the VM task state
    continuing to be DELETING. That logic was undermined in Id4358c50 which
    caused the task state to be reverted on the successful or unsuccessful
    conclusion of the action on the compute node.

    Now, the motivation for Id4358c50 was to avoid the VM getting stuck with
    a task state that prevents any further action other than deletion.
    However in the case of a deletion already having been requested, this
    protection is unnecessary and counter-productive. Hence we remove the
    task state reversion from the terminate_instance action.

    Change-Id: Ie5701e5c12f6241a203423d29d05df1858406c56

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-rc1 → 2012.2
Revision history for this message
Sam Stoelinga (sammiestoel) wrote :

We still encountered this problem in folsom, I don't know how to reproduce it yet though. It could be because of our customizations to Folsom.

I've verified that the patch: https://review.openstack.org/12887 is applied to our code base and that we are still encountering this.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.