Bug #1353962 “Test job fails with FixedIpLimitExceeded with nova...” : Bugs : OpenStack Compute (nova)

Attila Fazekas (afazekas) on 2014-08-07

description:

updated

Tracy Jones (tjones-i) on 2014-08-13

tags:

added: testing

Revision history for this message

Joe Gordon (jogo) wrote on 2014-09-09:

#1

Is it possible tempest is leaking servers?

Revision history for this message

Joe Gordon (jogo) wrote on 2014-09-09:

#2

https://review.openstack.org/#/c/112581/ should help sort this one out

Changed in nova:
status:	New → Incomplete

Revision history for this message

Joe Gordon (jogo) wrote on 2014-09-09:

#3

marking as incomplete in nova, since unable to confirm this is a nova bug

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-09-11:

#4

This looks like a duplicate of bug 1338844.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-09-11:

#5

Nevermind, 1338844 was fixed in tempest (quota issue), so this is different. There are 26 hits in 10 days.

Revision history for this message

Matthew Treinish (treinish) wrote on 2014-09-12:

#6

Even if tempest leaked every server it created (which is definitely not happening) if the network allocates a /24 there should still be plenty of ips available. From the logs above tempest only allocated a total of 171 servers:

http://logs.openstack.org/23/112523/1/check/check-tempest-dsvm-postgres-full/acac6d9/logs/qemu/

But the exception is a quota one, the default tenant quota for a fixed ips is definitely more than 2. (otherwise we'd have much larger issues) I'm pretty sure we set it to -1, but I'm not sure. Looking at the test class in question:

http://git.openstack.org/cgit/openstack/tempest/tree/tempest/api/compute/admin/test_servers_negative.py

The the tenant for that class will only have 2 server create calls. One in setUpClass and one in this test. I have a hard time believing 2 fixed ips exceeded the quota limit. Looking at the tests running around the same time, starting at:

http://logs.openstack.org/23/112523/1/check/check-tempest-dsvm-postgres-full/acac6d9/console.html.gz#_2014-08-07_09_42_03_634

which was the last test from a previous class on the same thread, until the same thing after the failed test:

http://logs.openstack.org/23/112523/1/check/check-tempest-dsvm-postgres-full/acac6d9/console.html.gz#_2014-08-07_09_42_27_778

I don't see anything that should have accidentally changed the quota to be < 2. I guess there could be a more hidden cross interaction because there are a bunch of admin api tests running around the same time (the joys of a default alphabetical test order), but I don't see one yet.

Revision history for this message

David Kranz (david-kranz) wrote on 2014-09-16:

#7

This happened 26 times between 9/10 and 9/13, and zero times between 9/13 and 9/16. Perhaps it is fixed but we should keep watching.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-10-06:

#8

Just hit it again:

http://logs.openstack.org/98/125398/11/check/check-tempest-dsvm-postgres-full/f0d9495/logs/screen-n-net.txt.gz?level=TRACE

There are 17 hits in the last 7 days, check and gate, all failures.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-10-06:

#9

Download full text (4.6 KiB)

Taking a look at the code in this stacktrace:

2014-10-06 08:00:23.848 DEBUG nova.network.manager [req-bf60b9d9-452c-4fbc-9f7a-cc90ed762536 ServersAdminTestJSON-1220859943 ServersAdminTestJSON-233825707] [instance: 50156a3a-8b14-40ea-affe-97e5a510ec32] Allocate fixed ip on network cbcee1f9-13d7-4ec6-a450-d777a3d11120 allocate_fixed_ip /opt/stack/new/nova/nova/network/manager.py:859
2014-10-06 08:00:23.874 DEBUG nova.network.manager [req-bf60b9d9-452c-4fbc-9f7a-cc90ed762536 ServersAdminTestJSON-1220859943 ServersAdminTestJSON-233825707] Quota exceeded for 3292f7b35abf4565a53d099ad878a335, tried to allocate fixed IP allocate_fixed_ip /opt/stack/new/nova/nova/network/manager.py:875
2014-10-06 08:00:23.874 ERROR oslo.messaging.rpc.dispatcher [req-bf60b9d9-452c-4fbc-9f7a-cc90ed762536 ServersAdminTestJSON-1220859943 ServersAdminTestJSON-233825707] Exception during message handling: Maximum number of fixed ips exceeded
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher incoming.message))
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args)
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/network/floating_ips.py", line 114, in allocate_for_instance
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher **kwargs)
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/network/manager.py", line 511, in allocate_for_instance
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher requested_networks=requested_networks)
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/network/manager.py", line 192, in _allocate_fixed_ips
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher vpn=vpn, address=address)
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/new/nova/nova/network/manager.py", line 876, in allocate_fixed_ip
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher raise exception.FixedIpLimitExceeded()
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher FixedIpLimitExceeded: Maximum number of fixed ips exceeded
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher

We're using the FlatDHCPManager:

http://logs.openstack.org/98/125398/11/check/check-tempest-dsvm-postgr...

Taking a look at the code in this stacktrace:

2014-10-06 08:00:23.848 DEBUG nova.network.manager [req-bf60b9d9-452c-4fbc-9f7a-cc90ed762536 ServersAdminTestJSON-1220859943 ServersAdminTestJSON-233825707] [instance: 50156a3a-8b14-40ea-affe-97e5a510ec32] Allocate fixed ip on network cbcee1f9-13d7-4ec6-a450-d777a3d11120 allocate_fixed_ip /opt/stack/new/nova/nova/network/manager.py:859
2014-10-06 08:00:23.874 DEBUG nova.network.manager [req-bf60b9d9-452c-4fbc-9f7a-cc90ed762536 ServersAdminTestJSON-1220859943 ServersAdminTestJSON-233825707] Quota exceeded for 3292f7b35abf4565a53d099ad878a335, tried to allocate fixed IP allocate_fixed_ip /opt/stack/new/nova/nova/network/manager.py:875
2014-10-06 08:00:23.874 ERROR oslo.messaging.rpc.dispatcher [req-bf60b9d9-452c-4fbc-9f7a-cc90ed762536 ServersAdminTestJSON-1220859943 ServersAdminTestJSON-233825707] Exception during message handling: Maximum number of fixed ips exceeded
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher     incoming.message))
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher     return self._do_dispatch(endpoint, method, ctxt, args)
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher     result = getattr(endpoint, method)(ctxt, **new_args)
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/network/floating_ips.py", line 114, in allocate_for_instance
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher     **kwargs)
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/network/manager.py", line 511, in allocate_for_instance
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher     requested_networks=requested_networks)
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/network/manager.py", line 192, in _allocate_fixed_ips
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher     vpn=vpn, address=address)
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher   File "/opt/stack/new/nova/nova/network/manager.py", line 876, in allocate_fixed_ip
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher     raise exception.FixedIpLimitExceeded()
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher FixedIpLimitExceeded: Maximum number of fixed ips exceeded
2014-10-06 08:00:23.874 25609 TRACE oslo.messaging.rpc.dispatcher

We're using the FlatDHCPManager:

http://logs.openstack.org/98/125398/11/check/check-tempest-dsvm-postgres-full/f0d9495/logs/etc/nova/nova.conf.txt.gz

So we're using DB quotas (VlanManager uses the no-op quota driver).

We could be logging the wrong project_id:

LOG.debug("Quota exceeded for %s, tried to allocate "
                      "fixed IP", context.project_id)

fixed_ips is a per-project quota in the DB API:

PER_PROJECT_QUOTAS = ['fixed_ips', 'floating_ips', 'networks']

So the user_id in this case doesn't matter.  I'm wondering if the actual project id used is the admin tenant?  I don't see 3292f7b35abf4565a53d099ad878a335 in tempest.conf though so it's probably not the admin tenant:

http://logs.openstack.org/98/125398/11/check/check-tempest-dsvm-postgres-full/f0d9495/logs/tempest_conf.txt.gz

From talking with Matt Treinish, it sounds like this test creates a tenant-isolated user/project with an admin role:

http://git.openstack.org/cgit/openstack/tempest/tree/tempest/api/compute/admin/test_servers.py#n29

So yeah, we have an admin role and we're using a per-project resource for quotas (fixed_ips), so we need to find out what the quota limit is per project in a tempest run.

I don't see the quota driver set in nova.conf so it defaults to the DB quota driver:

http://logs.openstack.org/98/125398/11/check/check-tempest-dsvm-postgres-full/f0d9495/logs/etc/nova/nova.conf.txt.gz

But what are the limits?  Because the defaults are 10.  I don't see anything in devstack that changes quotas unless the virt driver is the fake driver, which isn't the case here.

Matt Riedemann (mriedem) on 2014-10-06

tags:

added: gate-failure

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-10-06:

#10

Looking at tests that extend BaseV2ComputeAdminTest because tempest.api.compute.admin.test_servers creates 2 test instances for the JSON run alone (and there is an xml version also). Those would be on the same admin tenant, so if those are running concurrently we could have at least 4 instances running at the same time under that tenant.

We have 4 test workers, so we could have 2 other tests running concurrently.

There are 36 test classes that extend BaseV2ComputeAdminTest, not all of them create test servers though.

Looking at the console log when our test fails:

http://logs.openstack.org/98/125398/11/check/check-tempest-dsvm-postgres-full/f0d9495/console.html#_2014-10-06_08_00_28_464

And looking at what's running before it, we see:

tempest.api.compute.admin.test_quotas_negative.QuotasAdminNegativeTestXML

and

tempest.api.compute.admin.test_quotas.QuotaClassesAdminTestJSON

We've had race issues with QuotaClassesAdminTestJSON in the past where it would set the default quota class values to 0 so we'd hit over-quota. That was changed awhile back to take the default quotas and add 100 to the limit, so that shouldn't be causing issues.

QuotasAdminNegativeTestXML is running with force_tenant_isolation=False

So the quota admin tests should be running in tenant isolation and not impact ServersAdminTestJSON.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-10-06:

#11

Chatted with Matt Treinish and the json and xml tests should have separate tenants because in the gate we running with allow_tenant_isolation=True and the credentials are created in setUpClass of each test, which is different for the json and xml runs.

So I guess I'm back to wondering what's going on with the fixed_ips over-quota if it's a per-project resource quota and the test is only creating at most 4 test servers.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-06: Related fix proposed to nova (master)

#12

Related fix proposed to branch: master
Review: https://review.openstack.org/126401

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-06:

#13

Related fix proposed to branch: master
Review: https://review.openstack.org/126402

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-07: Related fix merged to nova (master)

#14

Reviewed: https://review.openstack.org/126401
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=044b18e7f02c588738bc21c43ad8066c9fcd7426
Submitter: Jenkins
Branch: master

commit 044b18e7f02c588738bc21c43ad8066c9fcd7426
Author: Matt Riedemann <email address hidden>
Date: Mon Oct 6 08:04:01 2014 -0700

nova-net: add more useful logging before raising FixedIpLimitExceeded

    The OverQuota exception that we get back from the DB API contains some
    useful information about the quota limits for the project and resource
    and the current usage, so we can log that before raising
    FixedIpLimitExceeded. Also logs the actual project ID passed to
    quotas.reserve rather than context.project_id since it's possible those
    are different.

This should help us figure out if we're leaking resources in gate runs.

Related-Bug: #1353962

Change-Id: Ia8a6d2336ddbbe28178be8d3ce4d97e6d4ab3787

Revision history for this message

Joe Gordon (jogo) wrote on 2014-10-07: Re: Test job failes with FixedIpLimitExceeded with nova network

#15

after further investigation with Matt Riedemann, it looks like there are some issues in the quotas code, although its still not clear if nova is that is the root cause of this bug or not.

Changed in nova:
status:	Incomplete → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-07: Related fix proposed to nova (master)

#16

Related fix proposed to branch: master
Review: https://review.openstack.org/126628

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-10-08: Re: Test job failes with FixedIpLimitExceeded with nova network

#17

This tells us that there is something wrong with quotas in nova for fixed IPs:

http://logs.openstack.org/28/126628/2/check/check-tempest-dsvm-postgres-full/a4b846b/logs/screen-n-cond.txt.gz#_2014-10-07_21_26_33_349

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-10-09:

#18

Bug 1284930 might be related, i.e. if we're not rolling back quota reservations on a fixed IP deallocate failure we could be leaking quotas that way.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-09: Change abandoned on nova (master)

#19

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/126402
Reason: I was confusing this for nova.quota.DbQuotaDriver.rollback but it's actually nova.objects.quotas.Quotas.rollback.

You were all right for not believing me!

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-09: Related fix merged to nova (master)

#20

Reviewed: https://review.openstack.org/126628
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ff2748990389deee62bed97456bf7a81cc43a47c
Submitter: Jenkins
Branch: master

commit ff2748990389deee62bed97456bf7a81cc43a47c
Author: Matt Riedemann <email address hidden>
Date: Wed Oct 8 08:28:32 2014 -0700

Log quota refresh in_use message at INFO level for logstash

Commit 5aef26496786c02a1ed324bbd2eb0b9994a2018d added the log message to
give some idea when we're unnecessarily refreshing quota usage.

    Logstash only collects INFO+ level messages, so bump this to an INFO
    level message so we can track gate runs to see if this is actually a
    problem.

Related-Bug: #1353962

Change-Id: I60ffa9dbda22e252f290bf316fa2e472d3e92f2f

Revision history for this message

jichenjc (jichenjc) wrote on 2014-10-29: Re: Test job failes with FixedIpLimitExceeded with nova network

#21

I hit it again here today

http://logs.openstack.org/60/111660/13/check/check-tempest-dsvm-full/d0f4d9b/logs/screen-n-net.txt.gz

really confusing me is following log

WARNING nova.network.manager [req-49960ae4-3327-4770-8ff1-e1002786ee79 ServersAdminNegativeTestXML-863236903 ServersAdminNegativeTestXML-1636622346] [instance: 7f330ee3-90fb-463f-a325-f4a8bd7c3a02] Quota exceeded for project 5e8c727ff61142d9a7ad424cdb8ca8b5, tried to allocate fixed IP. 0 of 99 are in use or are already reserved.

from code analysis, it should not report an exception because the used count is 0
I checked quota related code but I can't see any useful log to indicate why quota failed to allocate

so ,I will ad more logs to _raise_overquota_exception and see whether we can know what happened

Revision history for this message

jichenjc (jichenjc) wrote on 2014-10-29:

#22

@matt, can you guide me how to see logs in gate ? I am not familiar with it .... thanks

Revision history for this message

jichenjc (jichenjc) wrote on 2014-10-29:

#23

BTW: I didn't see logs changed at https://review.openstack.org/126628
so maybe it can be another cause of the problem?

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-10-29: Related fix proposed to nova (master)

#24

Related fix proposed to branch: master
Review: https://review.openstack.org/131737

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-11-24: Related fix merged to nova (master)

#25

Reviewed: https://review.openstack.org/131737
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5c9537b40092e701615e33e0dcf90a082d172f72
Submitter: Jenkins
Branch: master

commit 5c9537b40092e701615e33e0dcf90a082d172f72
Author: jichenjc <email address hidden>
Date: Tue Oct 28 04:55:50 2014 +0800

Add debug log when over quota exception occurs

    There are some weird logs showed in bug 1353962, with
    current logs we can't decide which part is wrong.
    Add more logs will help operater to know the real reason
    that make the quota exceed.

    If this change makes too much log out, after bug 1353962
    is fixed, we can talk about remove those logs, but currently
    it will be helpful.

Change-Id: I2ab1fb62efc06b1cee213920539980e331870bf5
Related-Bug: #1353962

Revision history for this message

Matthew Treinish (treinish) wrote on 2014-11-26: Re: Test job failes with FixedIpLimitExceeded with nova network

#26

I'm not seeing anything tempest related in this bug. It looks like nova is just falling down and enforcing a quota even when the limits aren't even close.

Changed in tempest:
status:	New → Incomplete

Davanum Srinivas (DIMS) (dims-v) on 2015-02-06

Changed in nova:
importance:	Undecided → Medium

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-25: Related fix proposed to nova (master)

#27

Related fix proposed to branch: master
Review: https://review.openstack.org/167814

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-03-26: Re: Test job failes with FixedIpLimitExceeded with nova network

#28

Here we go, we have better logging in the DB API when this fails now:

http://logs.openstack.org/01/167101/5/check/check-tempest-dsvm-full/9f94b8c/logs/screen-n-cond.txt.gz#_2015-03-25_19_19_37_678

Raise OverQuota exception because: project_quotas: {'fixed_ips': -1}, user_quotas: {'fixed_ips': 99L}, deltas: {u'fixed_ips': 1}, overs: [u'fixed_ips'], project_usages: {u'ram': {'total': 128L, 'reserved': 0L, 'in_use': 128L}, u'floating_ips': {'total': 1L, 'reserved': 0L, 'in_use': 1L}, u'instances': {'total': 2L, 'reserved': 0L, 'in_use': 2L}, u'cores': {'total': 2L, 'reserved': 0L, 'in_use': 2L}, u'fixed_ips': {'total': 1L, 'reserved': 0L, 'in_use': 1L}, u'security_groups': {'total': 1L, 'reserved': 0L, 'in_use': 1L}}, user_usages: {u'ram': <nova.db.sqlalchemy.models.QuotaUsage object at 0x7faaccedba50>, u'floating_ips': <nova.db.sqlalchemy.models.QuotaUsage object at 0x7faaccedb5d0>, u'instances': <nova.db.sqlalchemy.models.QuotaUsage object at 0x7faaccedbc90>, u'cores': <nova.db.sqlalchemy.models.QuotaUsage object at 0x7faaccedb750>, u'fixed_ips': <nova.db.sqlalchemy.models.QuotaUsage object at 0x7faaccedb510>, u'security_groups': <nova.db.sqlalchemy.models.QuotaUsage object at 0x7faaccedbf90>}

So project_quotas and user_quotas are different:

project_quotas: {'fixed_ips': -1}, user_quotas: {'fixed_ips': 99L}

The user_quotas are what's passed back to the network manager which is why the log message is confusing:

http://git.openstack.org/cgit/openstack/nova/tree/nova/db/sqlalchemy/api.py?id=2015.1.0b3#n3578

So we see this in the n-net logs:

http://logs.openstack.org/01/167101/5/check/check-tempest-dsvm-full/9f94b8c/logs/screen-n-net.txt.gz?level=TRACE#_2015-03-25_19_19_37_685

2015-03-25 19:19:37.685 WARNING nova.network.manager [req-0f01e070-e2ac-4058-a7cb-9d619bb557d6 FloatingIPsTestJSON-1260806534 FloatingIPsTestJSON-109816172] [instance: c90d1a08-4be7-4f88-8a73-36424d70acd6] Quota exceeded for project e3bba3eb15c844cda054b8c73ec68c0b, tried to allocate fixed IP. 1 of 99 are in use or are already reserved.

So from project_usages, we have this:

u'fixed_ips': {'total': 1L, 'reserved': 0L, 'in_use': 1L}

Which is why we go over-quota, because it thinks we have 1 total fixed_ip to use and it's already in use.

Here we go, we have better logging in the DB API when this fails now:

http://logs.openstack.org/01/167101/5/check/check-tempest-dsvm-full/9f94b8c/logs/screen-n-cond.txt.gz#_2015-03-25_19_19_37_678

Raise OverQuota exception because: project_quotas: {'fixed_ips': -1}, user_quotas: {'fixed_ips': 99L}, deltas: {u'fixed_ips': 1}, overs: [u'fixed_ips'], project_usages: {u'ram': {'total': 128L, 'reserved': 0L, 'in_use': 128L}, u'floating_ips': {'total': 1L, 'reserved': 0L, 'in_use': 1L}, u'instances': {'total': 2L, 'reserved': 0L, 'in_use': 2L}, u'cores': {'total': 2L, 'reserved': 0L, 'in_use': 2L}, u'fixed_ips': {'total': 1L, 'reserved': 0L, 'in_use': 1L}, u'security_groups': {'total': 1L, 'reserved': 0L, 'in_use': 1L}}, user_usages: {u'ram': <nova.db.sqlalchemy.models.QuotaUsage object at 0x7faaccedba50>, u'floating_ips': <nova.db.sqlalchemy.models.QuotaUsage object at 0x7faaccedb5d0>, u'instances': <nova.db.sqlalchemy.models.QuotaUsage object at 0x7faaccedbc90>, u'cores': <nova.db.sqlalchemy.models.QuotaUsage object at 0x7faaccedb750>, u'fixed_ips': <nova.db.sqlalchemy.models.QuotaUsage object at 0x7faaccedb510>, u'security_groups': <nova.db.sqlalchemy.models.QuotaUsage object at 0x7faaccedbf90>}

So project_quotas and user_quotas are different:

project_quotas: {'fixed_ips': -1}, user_quotas: {'fixed_ips': 99L}

The user_quotas are what's passed back to the network manager which is why the log message is confusing:

http://git.openstack.org/cgit/openstack/nova/tree/nova/db/sqlalchemy/api.py?id=2015.1.0b3#n3578

So we see this in the n-net logs:

http://logs.openstack.org/01/167101/5/check/check-tempest-dsvm-full/9f94b8c/logs/screen-n-net.txt.gz?level=TRACE#_2015-03-25_19_19_37_685

2015-03-25 19:19:37.685 WARNING nova.network.manager [req-0f01e070-e2ac-4058-a7cb-9d619bb557d6 FloatingIPsTestJSON-1260806534 FloatingIPsTestJSON-109816172] [instance: c90d1a08-4be7-4f88-8a73-36424d70acd6] Quota exceeded for project e3bba3eb15c844cda054b8c73ec68c0b, tried to allocate fixed IP. 1 of 99 are in use or are already reserved.

So from project_usages, we have this:

u'fixed_ips': {'total': 1L, 'reserved': 0L, 'in_use': 1L}

Which is why we go over-quota, because it thinks we have 1 total fixed_ip to use and it's already in use.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-03-26:

#29

This might also be useful, we are being forced to refresh quotas in the gate but only for fixed_ips:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwicXVvdGFfdXNhZ2VzIG91dCBvZiBzeW5jLCB1cGRhdGluZ1wiIEFORCB0YWdzOlwic2NyZWVuLW4tY29uZC50eHRcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTQyNzM4ODk2NDUyOCwibW9kZSI6IiIsImFuYWx5emVfZmllbGQiOiIifQ==

e.g.

[req-962680b8-2147-462e-a3bc-13d942b8989f MultipleCreateTestJSON-1247330694 MultipleCreateTestJSON-1141436516] quota_usages out of sync, updating. project_id: 5b4715def338466c9a6eea5886fb8f38, user_id: None, resource: fixed_ips, tracked usage: 0, actual usage: 1

[req-75c73b51-9f2a-4275-8ab7-1e8c270a3848 MultipleCreateTestJSON-1809221276 MultipleCreateTestJSON-267241457] quota_usages out of sync, updating. project_id: 647717c69bba4377b9421b2c832f4991, user_id: None, resource: fixed_ips, tracked usage: -1, actual usage: 1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-26: Related fix merged to nova (master)

#30

Reviewed: https://review.openstack.org/167814
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=fe5003bc1400e4e467699ef857071717e75146e7
Submitter: Jenkins
Branch: master

commit fe5003bc1400e4e467699ef857071717e75146e7
Author: Matt Riedemann <email address hidden>
Date: Wed Mar 25 14:58:04 2015 -0700

Add some logging in the quota.reserve flow

Try to figure out what's going on with a race in the gate
when we go over-quota trying to allocate a fixed IP.

Change-Id: Ib23450826a5b4a995360d1958e6070b23a3e732c
Related-Bug: #1353962

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-26: Related fix proposed to nova (master)

#31

Related fix proposed to branch: master
Review: https://review.openstack.org/168158

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-27: Related fix merged to nova (master)

#32

Reviewed: https://review.openstack.org/168158
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=af4ce3ee63e4e4af9113405396c3201eb3efe3c6
Submitter: Jenkins
Branch: master

commit af4ce3ee63e4e4af9113405396c3201eb3efe3c6
Author: Matt Riedemann <email address hidden>
Date: Thu Mar 26 13:46:40 2015 -0700

Add debug logging to quota_reserve flow

There are some places in the quota_reserve flow that could use some
debug logging for helping to figure out the problem when we over quota:

    1. Log when we detect a desync in _is_quota_refresh_needed and force a
       refresh.
    2. There is a comment in quota_reserve after we call
       _refresh_quota_usages saying we assume the sync was OK and
       essentially everything is hunky dory. Log if we actually refreshed
       without a change to usage.
    3. Log the details from project/user quotas/usages when we go over
       quota in _calculate_overquota.
    4. Log if we update project_usages using values from user_usages in
       quota_reserve.
    5. Finally, before we raise the OverQuota exception in quota_reserve,
       the user_usages dict values are UserQuota objects which are sqla
       dict-like objects and don't stringify nicely for the logs, so
       convert those to actual dict values for logging.

Related-Bug: #1353962

Change-Id: Ic6e3d5fdf51c0a13c97509d0888a3bba00fb3193

Matt Riedemann (mriedem) on 2015-06-02

summary:

- Test job failes with FixedIpLimitExceeded with nova network
+ Test job fails with FixedIpLimitExceeded with nova network

Revision history for this message

Yaroslav Lobankov (ylobankov) wrote on 2015-06-30:

#33

It looks like this bug has nothing to do with Tempest. So moving the status of the bug to "Invalid".

Changed in tempest:
status:	Incomplete → Invalid

Matthew Treinish (treinish) on 2015-09-09

tags:

added: quotas

Revision history for this message

Markus Zoeller (markus_z) (mzoeller) wrote on 2016-07-05: Cleanup EOL bug report

#34

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
importance:	Medium → Undecided
status:	Confirmed → Expired

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Expired	Undecided	Unassigned
	tempest	Invalid	Undecided	Unassigned

OpenStack Compute (nova)

Test job fails with FixedIpLimitExceeded with nova network

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches