Bug #1202896 “quota_usage data constantly out of sync (needs tes...” : Bugs : OpenStack Compute (nova)

Revision history for this message

Joe Gordon (jogo) wrote on 2013-07-30:

#1

Which quotas got out of date? Can you provide any further detail?

Revision history for this message

Sam Morrison (sorrison) wrote on 2013-07-30:

#2

Cores, instances and ram get out of sync, for cinder it's volumes, gigabytes and snapshots.

It can happen when you delete an instance but it fails to delete, it will go to vm_state error, task_state deleting.
This will decrement the quota_usage values.

This instance still remains in the system for the user and they are able to delete it again. This causes quota_usage values to be decremented again.

This is pretty critical for us as it means projects can use more than their quota.

This is an easy one to replicate but the most common out of sync errors we get are when quota_usage values are higher than they should be.

I'll try and get some more concrete examples.

Revision history for this message

John Griffith (john-griffith) wrote on 2013-07-31:

#3

I'll work on reproducing this but in the meantime I also noticed we're adjusting qutoas in api and manager which is obviously going to cause some issues.

Changed in cinder:
status:	New → Triaged
importance:	Undecided → Critical
assignee:	nobody → John Griffith (john-griffith)
milestone:	none → 2013.1.3

Revision history for this message

Joe Gordon (jogo) wrote on 2013-07-31:

#4

John, this may be similar to a nova issue we had a while back: https://bugs.launchpad.net/nova/+bug/1098380

John Griffith (john-griffith) on 2013-08-01

Changed in cinder:
assignee:	John Griffith (john-griffith) → nobody
milestone:	2013.1.3 → none

Revision history for this message

Sam Morrison (sorrison) wrote on 2013-08-01:

#5

Download full text (4.5 KiB)

Here's an example of a user who is experiencing this issue:

<pre>
mysql> select * from reservations where project_id ='cda9642942d24b7cab4bf1d56f61b5e7';
+---------------------+------------+---------------------+--------+--------------------------------------+----------+----------------------------------+-----------------+-------+---------------------+---------+
| created_at | updated_at | deleted_at | id | uuid | usage_id | project_id | resource | delta | expire | deleted |
+---------------------+------------+---------------------+--------+--------------------------------------+----------+----------------------------------+-----------------+-------+---------------------+---------+
| 2013-07-31 05:36:38 | NULL | 2013-07-31 05:36:39 | 523281 | b1cbdb12-88a6-4601-8b86-5228f31d1ef2 | 4578 | cda9642942d24b7cab4bf1d56f61b5e7 | security_groups | 1 | 2013-08-01 05:36:38 | 523281 |
| 2013-07-31 05:36:41 | NULL | 2013-07-31 05:36:42 | 523284 | 5119268c-1ca4-486b-b173-41f18c166880 | 4578 | cda9642942d24b7cab4bf1d56f61b5e7 | security_groups | 1 | 2013-08-01 05:36:41 | 523284 |
| 2013-07-31 05:36:43 | NULL | 2013-07-31 05:36:44 | 523287 | cfb61f29-f609-4165-b929-773334809869 | 4578 | cda9642942d24b7cab4bf1d56f61b5e7 | security_groups | 1 | 2013-08-01 05:36:43 | 523287 |
| 2013-07-31 05:41:12 | NULL | NULL | 523461 | ae97502d-ec53-46e2-aa9b-3ab17ad5bd23 | 4581 | cda9642942d24b7cab4bf1d56f61b5e7 | instances | 1 | 2013-08-01 05:41:12 | 0 |
| 2013-07-31 05:41:12 | NULL | NULL | 523464 | 34845c5c-2281-4177-bb8b-a78f71499d5d | 4584 | cda9642942d24b7cab4bf1d56f61b5e7 | ram | 8192 | 2013-08-01 05:41:12 | 0 |
| 2013-07-31 05:41:12 | NULL | NULL | 523467 | 86da791d-0adb-4c1f-ae60-5368f813552a | 4587 | cda9642942d24b7cab4bf1d56f61b5e7 | cores | 2 | 2013-08-01 05:41:12 | 0 |
| 2013-07-31 05:48:45 | NULL | 2013-07-31 05:48:45 | 523812 | f21a32c3-dfdc-4d1e-9ca2-0129200975e3 | 4581 | cda9642942d24b7cab4bf1d56f61b5e7 | instances | -1 | 2013-08-01 05:48:44 | 523812 |
| 2013-07-31 05:48:45 | NULL | 2013-07-31 05:48:45 | 523815 | 277d3c09-8b47-4add-87f3-56230c94be3c | 4584 | cda9642942d24b7cab4bf1d56f61b5e7 | ram | -8192 | 2013-08-01 05:48:44 | 523815 |
| 2013-07-31 05:48:45 | NULL | 2013-07-31 05:48:45 | 523818 | efb33b21-de55-4c42-abbb-7ab423a94d45 | 4587 | cda9642942d24b7cab4bf1d56f61b5e7 | cores | -2 | 2013-08-01 05:48:44 | 523818 |
+---------------------+------------+---------------------+--------+--------------------------------------+----------+----------------------------------+-----------------+-------+---------------------+---------+
9 rows in set (0.01 sec)

mysql> select * from quota_usages where project_id = 'cda9642942d24b7cab4bf1d56f61b5e7';
+---------------------+---------------------+------------+------+----------------------------------+-----------------+--------+----------+---------------+---------+
| created_at ...

Here's an example of a user who is experiencing this issue:

<pre>
mysql> select * from reservations where project_id ='cda9642942d24b7cab4bf1d56f61b5e7';
+---------------------+------------+---------------------+--------+--------------------------------------+----------+----------------------------------+-----------------+-------+---------------------+---------+
| created_at          | updated_at | deleted_at          | id     | uuid                                 | usage_id | project_id                       | resource        | delta | expire              | deleted |
+---------------------+------------+---------------------+--------+--------------------------------------+----------+----------------------------------+-----------------+-------+---------------------+---------+
| 2013-07-31 05:36:38 | NULL       | 2013-07-31 05:36:39 | 523281 | b1cbdb12-88a6-4601-8b86-5228f31d1ef2 |     4578 | cda9642942d24b7cab4bf1d56f61b5e7 | security_groups |     1 | 2013-08-01 05:36:38 |  523281 |
| 2013-07-31 05:36:41 | NULL       | 2013-07-31 05:36:42 | 523284 | 5119268c-1ca4-486b-b173-41f18c166880 |     4578 | cda9642942d24b7cab4bf1d56f61b5e7 | security_groups |     1 | 2013-08-01 05:36:41 |  523284 |
| 2013-07-31 05:36:43 | NULL       | 2013-07-31 05:36:44 | 523287 | cfb61f29-f609-4165-b929-773334809869 |     4578 | cda9642942d24b7cab4bf1d56f61b5e7 | security_groups |     1 | 2013-08-01 05:36:43 |  523287 |
| 2013-07-31 05:41:12 | NULL       | NULL                | 523461 | ae97502d-ec53-46e2-aa9b-3ab17ad5bd23 |     4581 | cda9642942d24b7cab4bf1d56f61b5e7 | instances       |     1 | 2013-08-01 05:41:12 |       0 |
| 2013-07-31 05:41:12 | NULL       | NULL                | 523464 | 34845c5c-2281-4177-bb8b-a78f71499d5d |     4584 | cda9642942d24b7cab4bf1d56f61b5e7 | ram             |  8192 | 2013-08-01 05:41:12 |       0 |
| 2013-07-31 05:41:12 | NULL       | NULL                | 523467 | 86da791d-0adb-4c1f-ae60-5368f813552a |     4587 | cda9642942d24b7cab4bf1d56f61b5e7 | cores           |     2 | 2013-08-01 05:41:12 |       0 |
| 2013-07-31 05:48:45 | NULL       | 2013-07-31 05:48:45 | 523812 | f21a32c3-dfdc-4d1e-9ca2-0129200975e3 |     4581 | cda9642942d24b7cab4bf1d56f61b5e7 | instances       |    -1 | 2013-08-01 05:48:44 |  523812 |
| 2013-07-31 05:48:45 | NULL       | 2013-07-31 05:48:45 | 523815 | 277d3c09-8b47-4add-87f3-56230c94be3c |     4584 | cda9642942d24b7cab4bf1d56f61b5e7 | ram             | -8192 | 2013-08-01 05:48:44 |  523815 |
| 2013-07-31 05:48:45 | NULL       | 2013-07-31 05:48:45 | 523818 | efb33b21-de55-4c42-abbb-7ab423a94d45 |     4587 | cda9642942d24b7cab4bf1d56f61b5e7 | cores           |    -2 | 2013-08-01 05:48:44 |  523818 |
+---------------------+------------+---------------------+--------+--------------------------------------+----------+----------------------------------+-----------------+-------+---------------------+---------+
9 rows in set (0.01 sec)

mysql> select * from quota_usages where project_id = 'cda9642942d24b7cab4bf1d56f61b5e7';
+---------------------+---------------------+------------+------+----------------------------------+-----------------+--------+----------+---------------+---------+
| created_at          | updated_at          | deleted_at | id   | project_id                       | resource        | in_use | reserved | until_refresh | deleted |
+---------------------+---------------------+------------+------+----------------------------------+-----------------+--------+----------+---------------+---------+
| 2013-07-31 05:36:38 | 2013-07-31 05:36:44 | NULL       | 4578 | cda9642942d24b7cab4bf1d56f61b5e7 | security_groups |      3 |        0 |             3 |       0 |
| 2013-07-31 05:41:12 | 2013-07-31 05:51:06 | NULL       | 4581 | cda9642942d24b7cab4bf1d56f61b5e7 | instances       |      0 |        1 |             5 |       0 |
| 2013-07-31 05:41:12 | 2013-07-31 05:51:06 | NULL       | 4584 | cda9642942d24b7cab4bf1d56f61b5e7 | ram             |      0 |     8192 |             5 |       0 |
| 2013-07-31 05:41:12 | 2013-07-31 05:51:06 | NULL       | 4587 | cda9642942d24b7cab4bf1d56f61b5e7 | cores           |      0 |        2 |             5 |       0 |
+---------------------+---------------------+------------+------+----------------------------------+-----------------+--------+----------+---------------+---------+
4 rows in set (0.01 sec)

</pre>

I'm not all that familiar with reservations but it looks as if the reservation hasn't been cleaned up properly?

Note we actually usually see the in_use values of the quota_usages table out of sync so this may be a slightly different issue

Thierry Carrez (ttx) on 2013-08-06

Changed in cinder:
milestone:	none → havana-3

melanie witt (melwitt) on 2013-08-09

tags:

added: api compute

John Griffith (john-griffith) on 2013-08-27

Changed in cinder:
assignee:	nobody → John Griffith (john-griffith)

melanie witt (melwitt) on 2013-08-28

Changed in nova:
importance:	Undecided → Critical
status:	New → Confirmed

Revision history for this message

Joshua Hesketh (joshua.hesketh) wrote on 2013-09-03:

#6

This may be related: https://bugs.launchpad.net/nova/+bug/1212798

Changed in nova:
status:	Confirmed → In Progress
status:	In Progress → Confirmed

Thierry Carrez (ttx) on 2013-09-03

Changed in nova:
milestone:	none → havana-3

John Griffith (john-griffith) on 2013-09-05

Changed in cinder:
milestone:	havana-3 → havana-rc1

Thierry Carrez (ttx) on 2013-09-05

Changed in nova:
milestone:	havana-3 → havana-rc1

Russell Bryant (russellb) on 2013-09-10

Changed in nova:
importance:	Critical → High

John Griffith (john-griffith) on 2013-09-10

Changed in cinder:
importance:	Critical → High
assignee:	John Griffith (john-griffith) → nobody

Russell Bryant (russellb) on 2013-09-23

tags:	added: havana-rc-proposed
Changed in nova:
milestone:	havana-rc1 → none
tags:	added: havana-rc-potential removed: havana-rc-proposed

John Griffith (john-griffith) on 2013-09-24

Changed in cinder:
milestone:	havana-rc1 → next

Seif Lotfy (seif) on 2013-09-24

Changed in cinder:
assignee:	nobody → Seif Lotfy (seif)

Revision history for this message

Seif Lotfy (seif) wrote on 2013-09-25:

#7

my current solution is to get rid of quota_usage table and make a view out of it that reflects the status of the volumes table and snapshots

Thierry Carrez (ttx) on 2013-10-14

tags:

added: havana-backport-potential
removed: havana-rc-potential

Revision history for this message

Joe Gordon (jogo) wrote on 2013-12-04:

#8

Sam are you still seeing this issue, if so how can I reproduce it in nova?

Changed in nova:
status:	Confirmed → Incomplete

Revision history for this message

Joe Gordon (jogo) wrote on 2013-12-04:

#9

Marking as incomplete because not sure how to reproduce this

Revision history for this message

Sam Morrison (sorrison) wrote on 2013-12-09:

#10

Yeah it's a tricky one, we have ~3000 users and approx. 30k instance boots per month and we only see this a couple times a month at most.

I'm sure there is a bug in there somewhere but the rate we hit it and my inability to reproduce it make this pretty hard to fix. We're about to upgrade to Havana so we might see this less and less as code matures.

Revision history for this message

gabriel staicu (gabriel-staicu) wrote on 2014-01-22:

#11

This happens to me always. I have an HA setup for the controller part of Openstack based on mysql galera and every time I terminate more instances (more then 7) in the table quota_usages from nova table there are some resources used.

I have a Havana setup on Ubuntu 12.04.

Revision history for this message

Sam Morrison (sorrison) wrote on 2014-01-22:

#12

Yes this is still a problem for us, we have written a little script to sync the quota_usages table that we run every 6 hours or so

Revision history for this message

Sam Morrison (sorrison) wrote on 2014-01-22:

#13

Forgot to mention that upgrading to Havana made it worse

Revision history for this message

Chris Behrens (cbehrens) wrote on 2014-03-24:

#14

We need to audit the quota code and make sure we're only updating quotas when the DB records for the instance update successfully. Soft delete is a little interesting because it needs to update quotas before 'deleting' the DB record. But we then need to make sure a real delete later doesn't update them again.

Anyway, I filed this bug yesterday which is some of the problem: https://bugs.launchpad.net/bugs/1296414

Revision history for this message

Chris Behrens (cbehrens) wrote on 2014-03-24:

#15

There's also these bugs if you do any actions as admins, which are all in the process of being fixed:

https://launchpad.net/bugs/1271541
https://launchpad.net/bugs/1271429
https://launchpad.net/bugs/1278686
https://launchpad.net/bugs/1278695
https://launchpad.net/bugs/1260575

Revision history for this message

Jacob Cherkas (jcherkas) wrote on 2014-03-25:

#16

This can be reproduced consistently by launching about 100 instances via dashboard , after they have completed launching, select all the instances and terminate them.

The quota usage in overview will always show some instances, ram and vcpus still in use.

Revision history for this message

Alexei Kornienko (alexei-kornienko) wrote on 2014-08-16:

#17

I think that all the mess we get with quotas is because cinder quotas implementation is not very consistent.
What i see is that we reserve quotas in 1 method and commit/rollback them in another method. quota reservation can be passed through rpc so it means that it's reserved in 1 processes and commited in another. IMHO such approach is very fragile and error prone.
I propose to implement quota reservation as a context manager:

with quotas.reserve(...) as reservation:
...

This will allow us to make sure that quotas always stay consitent and will remove the need to expire quotas. I can prepare a POC patch if you are interested. What do you think?

Revision history for this message

John Griffith (john-griffith) wrote on 2014-08-16:

#18

@Alexei,
I'd be interested in a POC. I'm curious though, about the Nova side.

Anyway, but I am curious to what we're like in Icehouse and Juno in this respect. Marking Invalid until we get better ways to reproduce.

Changed in cinder:
status:	Triaged → Incomplete

Revision history for this message

Sean Dague (sdague) wrote on 2014-09-04:

#19

I'm marking this as confirmed because I think it's a real issue, there is actually a reproduce in here, and realistically could be addressed with functional testing.

summary:	- quota_usage data constantly out of sync + quota_usage data constantly out of sync (needs test)
tags:	added: needs-functional-test removed: havana-backport-potential
Changed in nova:
status:	Incomplete → Confirmed

Revision history for this message

Duncan Thomas (duncan-thomas) wrote on 2014-09-04:

#20

> I propose to implement quota reservation as a context manager:
>
> with quotas.reserve(...) as reservation:
> ...
>
> This will allow us to make sure that quotas always stay consitent and
> will remove the need to expire quotas. I can prepare a POC patch if
> you are interested. What do you think?

I don't think we can do this in cinder, the fact that we reserve in the API and commit in the manager appears to be entirely necessary - we reserve in the API so that we can give sensible warnings about being out of quota back to the caller, but we can't commit until the manager since we don't know whether the resource is actually consumed or not until then.

It might be useful to add more information to the reservation record - the request id that caused the reservation for example, then we have some ability to match troublesome reservations to the rest of the logs.

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Incomplete	High	Seif Lotfy	Cinder next
	OpenStack Compute (nova)	Confirmed	High	Unassigned

OpenStack Compute (nova)

quota_usage data constantly out of sync (needs test)

Bug Description

Other bug subscribers

Remote bug watches