OpenStack Compute (nova)

nova quota statistics can be incorrect

Bug #1284424 reported by Robert Collins on 2014-02-25

100

This bug affects 24 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	Dmitry Stepanenko
	tripleo	Fix Released	High	Unassigned

Bug Description

on the ci-overcloud we had a couple of network interruptions to the control plane. Subsequent to this, nova is reporting:
nova boot --image user --flavor m1.small --key-name default live-migration-test2 --nic net-id=f69ac547-db64-4e69-ae70-e5233634aff0
ERROR: Quota exceeded for instances: Requested 1, but already used 100 of 100 instances (HTTP 413) (Request-ID: req-0c96b3bc-dc37-4685-8227-02398b3bea6b)
(ci-overcloud-nodepool)robertc@lifelesshp:~/work$ nova list | wc -l
42

That is - nova thinks there are a 100 instances, but there are only 42. We haven't done any DB surgery or anything that could cause this, so methinks we've uncovered a bug.

Tags:

Tracy Jones (tjones-i) on 2014-02-26

tags:

added: compute network

John Garbutt (johngarbutt) on 2014-02-26

tags:	removed: network
Changed in nova:
importance:	Undecided → High
status:	New → Triaged

Shawn Hartsock (hartsock) on 2014-03-06

Changed in nova:
assignee:	nobody → Shawn Hartsock (hartsock)

Shawn Hartsock (hartsock) on 2014-03-07

Changed in nova:
status:	Triaged → In Progress

Revision history for this message

Michael Still (mikal) wrote on 2014-03-13:

I think we need more detail in this bug report in order to determine what went wrong. It would be interesting to see a dump of the quota tables when this happens.

Changed in nova:
status:	In Progress → Incomplete

Revision history for this message

Michael Still (mikal) wrote on 2014-03-13:

(Also, Shawn, you should let gerrit set this to in progress when you upload a review to address the problem).

Revision history for this message

Robert Collins (lifeless) wrote on 2014-03-21:

So this has happened again, just in regular use as a CI cloud. Mikal - what information would you like - just a select * from quota_* ?

Changed in nova:
status:	Incomplete → New

Revision history for this message

Robert Collins (lifeless) wrote on 2014-03-23:

Once we gather whatever data is needed:

11:14 < lifeless> SpamapS: hey, you had a query to fix quotas in the ci-overcloud right ?
11:14 < SpamapS> lifeless: yes... it was...
11:14 < lifeless> SpamapS: perhaps put a copy of it in https://bugs.launchpad.net/tripleo/+bug/1284424
11:14 < uvirtbot> Launchpad bug 1284424 in tripleo "nova quota statistics can be incorrect" [High,Triaged]
11:15 < SpamapS> update quota_usages set in_use=-1 where project_id='64d2d3bc07084ef1accd4e3502909c77';
11:15 < SpamapS> lifeless: that id == nodepool
11:15 < lifeless> that forces a native recalculate?
11:16 < SpamapS> lifeless: it did for me
11:16 < SpamapS> which begs the question..

Revision history for this message

melanie witt (melwitt) wrote on 2014-03-25:

Setting to confirmed since the reporter saw the issue again.

Changed in nova:
status:	New → Confirmed

Matt Riedemann (mriedem) on 2014-05-08

tags:

added: quotas

Jay Pipes (jaypipes) on 2014-07-16

Changed in nova:
assignee:	Shawn Hartsock (hartsock) → nobody

Revision history for this message

danieru (samuraidanieru) wrote on 2014-07-22:

Can add that I'm seeing this problem with a vanilla install, per the Icehouse install instructions for Ubuntu Trusty 14.04. It seems like the delete process isn't cleaning everything up properly. I tried deleting several instances at once from the dashboard. They got stuck on the error delete process and I had to manually use 'nova reset-state' and then 'nova delete' to make them disappear. Now the quota is still reporting them.

Revision history for this message

Rajani Srivastava (srivastava-rajani) wrote on 2014-07-25:

Hi Robert,

Are you referring to "nova usage-list" command ?

When we delete any instance, the above command still shows the usage for that deleted instance in cli. While it is not displayed in the dashboard.

Revision history for this message

haruka tanizawa (h-tanizawa) wrote on 2014-09-10:

Hi Robert
I'm afraid ... but I want to know this bug in detail.
This is related to cleaning quota when deleting instance or
nova usage-list ?
My concern is that 'network interruptions' in your report.
The number of 42 is too far from 100. Is it still happen ?
If you have specific situation, writing more deeply is very thankful.

Revision history for this message

haruka tanizawa (h-tanizawa) wrote on 2014-10-28:

Is this related to https://review.openstack.org/#/c/122347/ ?

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-12-10:

#10

Bug 1353962 is possibly related, it was seen in the gate.

Revision history for this message

Kevin Fox (kevpn) wrote on 2015-01-22:

#11

I've just ran into this too with a fresh rdo juno on sl7. I had some config options wrong on a compute node and launched some test instances to it. They launched with error, deleted fine but quota's wrong.

I can't recall exactly, but the reproducer in this thread:
http://www.gossamer-threads.com/lists/openstack/dev/38332

Sounds very similar to what happened. I think I may have had a bad neutron_admin_password in /etc/nova/nova.conf but I may be misremembering exactly which password option/config file that was involved.

Revision history for this message

Kevin Fox (kevpn) wrote on 2015-01-22:

#12

I also tried SpamapS's recommendation for fixing the quota's by setting the tenants quota values to -1. They have not yet updated the data. Still all -1's. I also tried restarting nova-compute on all the hosts to see if that would update things. It didn't. Is there some amount of time it takes to sync, or something else that needs to happen before things get updated?

Revision history for this message

Joe Gordon (jogo) wrote on 2015-03-11:

#13

A few possible ways to reproduce this:

https://github.com/pcrews/rannsaka

add a mock on top of RPC calls/casts to force them to fail some percentage of the time.

Revision history for this message

Patrick Crews (patrick-crews) wrote on 2015-03-12:

#14

This command line should be a good starting point against a devstack install:
rannsaka.py --host=http://$KEYSTONE_IP --requests 20000 -w 15 --test-file=locust_files/server_quota.py

Also, I seem to recall quota statistics going awry after a libvirt restart, but will need further testing to verify / will update tomorrow after more tests.

Revision history for this message

Joe Gordon (jogo) wrote on 2015-03-12:

#15

I was able to reproduce the failure with:

http://paste.ubuntu.com/10587937/ + restarting nova api services while the test ran

Results:

http://paste.openstack.org/show/191961/

Revision history for this message

Joe Gordon (jogo) wrote on 2015-03-12:

#16

Using "nova quota-delete --tenant" to correct the quotas

Revision history for this message

Joe Gordon (jogo) wrote on 2015-03-13:

#17

Looks like restarting nova-compute during nova boots and nova deletes causes the quota sync issue.

Revision history for this message

Joe Gordon (jogo) wrote on 2015-03-13:

#18

Note: setting max_age or until_refresh in nova.conf should help minimize the quota out of sync issues.

Revision history for this message

Joe Gordon (jogo) wrote on 2015-03-13:

#19

After further testing, the quotas don't get out of sync from restarting n-cpu during instance boots. This happens when restarting n-cpu during instance deletes

Revision history for this message

Joe Gordon (jogo) wrote on 2015-03-13:

#20

Looks like this is the bug: https://bugs.launchpad.net/nova/+bug/1296414

BalaGopalaKrishna (bala-9) on 2015-07-20

Changed in nova:
assignee:	nobody → BalaGopalaKrishna (bala-9)

BalaGopalaKrishna (bala-9) on 2015-07-20

Changed in nova:
assignee:	BalaGopalaKrishna (bala-9) → nobody

Revision history for this message

Chris Friesen (cbf123) wrote on 2016-02-01:

#21

Should this be marked as a dupe of https://bugs.launchpad.net/nova/+bug/1296414 ?

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-17: Related fix proposed to nova (master)

#22

Related fix proposed to branch: master
Review: https://review.openstack.org/293800

Dmitry Stepanenko (dstepanenko) on 2016-03-21

Changed in nova:
assignee:	nobody → Dmitry Stepanenko (dstepanenko)

Revision history for this message

Dmitry Stepanenko (dstepanenko) wrote on 2016-03-22:

#23

Reproduced issue on devstack with reclaim_instance_interval set in nova.conf. There were several changes which affected this issue and the results now looks a bit different, but quota statistics is still can be incorrect.

http://paste.openstack.org/show/491422/

The easiest way to reproduce the issue - run Joe's script and terminate nova-compute service right after message 'Request to delete server ... has been accepted' appears. This will lead to a state when quota usage is more than amount of instances.

Revision history for this message

Dmitry Stepanenko (dstepanenko) wrote on 2016-04-13:

#24

I found that my repro hit scenario when the instance was already deleted, but quotas commit didn't happened yet (nova-* process died right after removing instance). This situation occured in _delete_instance method of nova/compute/manager.py before _complete_deletion happened - https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2465

Revision history for this message

Markus Zoeller (markus_z) (mzoeller) wrote on 2016-04-18:

#25

Note: There is a ML discussion about this: http://lists.openstack.org/pipermail/openstack-dev/2016-April/092249.html

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

Steven Hardy (shardy) wrote on 2016-04-21: potentially eol bug

#26

This bug was reported against an old version of TripleO, and may no longer be valid.

Since it was reported before the start of the liberty cycle (and our oldest stable
branch is stable/liberty), I'm marking this incomplete.

Please reopen this (change the status from incomplete) if the bug is still valid
on a current supported (stable/liberty, stable/mitaka or trunk) version of TripleO,
thanks!

Changed in tripleo:
status:	Triaged → Incomplete

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-13: Change abandoned on nova (master)

#27

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/293800

doscho (doscho) on 2016-08-21

Changed in nova:
status:	In Progress → New
status:	New → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-12-16:

#28

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/293800
Reason: This patch has been sitting unchanged for more than 12 weeks. I am therefore going to abandon it to keep the nova review queue sane. Please feel free to restore the change if you're still working on it.

Ben Nemec (bnemec) on 2017-06-13

Changed in tripleo:
status:	Incomplete → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.