neutron

'SubnetInUse: Unable to complete operation on subnet UUID. One or more ports have an IP allocation from this subnet.'

Bug #1235435 reported by Joe Gordon on 2013-10-04

This bug affects 4 people

	Status	Importance	Assigned to	Milestone
OpenStack Compute (nova)	Fix Released	Medium	Armando Migliaccio	OpenStack Compute (nova) 2014.1 "icehouse"
Havana	Fix Released	Medium	Yaguang Tang	OpenStack Compute (nova) 2013.2.3
neutron	Invalid	Undecided	Unassigned
tempest	Invalid	Undecided	Unassigned

Bug Description

Occasional tempest failure:

http://logs.openstack.org/86/49086/2/gate/gate-tempest-devstack-vm-neutron-isolated/ce14ceb/testr_results.html.gz

ft3.1: tearDownClass (tempest.scenario.test_network_basic_ops.TestNetworkBasicOps)_StringException: Traceback (most recent call last):
  File "tempest/scenario/manager.py", line 239, in tearDownClass
    thing.delete()
  File "tempest/api/network/common.py", line 71, in delete
    self.client.delete_subnet(self.id)
  File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 112, in with_params
    ret = self.function(instance, *args, **kwargs)
  File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 380, in delete_subnet
    return self.delete(self.subnet_path % (subnet))
  File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 1233, in delete
    headers=headers, params=params)
  File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 1222, in retry_request
    headers=headers, params=params)
  File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 1165, in do_request
    self._handle_fault_response(status_code, replybody)
  File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 1135, in _handle_fault_response
    exception_handler_v20(status_code, des_error_body)
  File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 97, in exception_handler_v20
    message=msg)
NeutronClientException: 409-{u'NeutronError': {u'message': u'Unable to complete operation on subnet 9e820b02-bfe2-47e3-b186-21c5644bc9cf. One or more ports have an IP allocation from this subnet.', u'type': u'SubnetInUse', u'detail': u''}}

logstash query:

@message:"One or more ports have an IP allocation from this subnet" AND @fields.filename:"logs/screen-q-svc.txt" and @message:" SubnetInUse: Unable to complete operation on subnet"

http://logstash.openstack.org/#eyJzZWFyY2giOiJAbWVzc2FnZTpcIk9uZSBvciBtb3JlIHBvcnRzIGhhdmUgYW4gSVAgYWxsb2NhdGlvbiBmcm9tIHRoaXMgc3VibmV0XCIgQU5EIEBmaWVsZHMuZmlsZW5hbWU6XCJsb2dzL3NjcmVlbi1xLXN2Yy50eHRcIiBhbmQgQG1lc3NhZ2U6XCIgU3VibmV0SW5Vc2U6IFVuYWJsZSB0byBjb21wbGV0ZSBvcGVyYXRpb24gb24gc3VibmV0XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzODA5MTY1NDUxODcsIm1vZGUiOiIiLCJhbmFseXplX2ZpZWxkIjoiIn0=

Revision history for this message

Zhikun Liu (zhikunliu) wrote on 2013-10-04: AUTO: Zhi Kun ZK Liu is on vacation

I am out of the office until 10/07/2013.

I will take vacation from 28th Sep to 7th Oct . If have any urgent, please
call 13910806810

Note: This is an automated response to your message "[Bug 1235435] [NEW]
'SubnetInUse: Unable to complete operation on subnet UUID. One or more
ports have an IP allocation from this subnet.'" sent on 10/05/2013
3:58:43.

This is the only notification you will receive while this person is away.

Revision history for this message

Anita Kuno (anteaya) wrote on 2013-11-13:

Last seen Nov. 1st, not seen since. (This is November 12th as I write.)

Let's keep an eye on this in case it returns.

Revision history for this message

Joe Gordon (jogo) wrote on 2013-11-14:

It just returned see http://status.openstack.org/elastic-recheck/

Armando Migliaccio (armando-migliaccio) on 2013-11-14

Changed in neutron:
status:	New → Confirmed
assignee:	nobody → Armando Migliaccio (armando-migliaccio)

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2013-11-15:

From what the logs show it looks like nova-compute goes in some sort of loop where it keeps getting network info: See nova-cpu and nova-cond logs:

http://logs.openstack.org/03/56303/2/check/check-tempest-devstack-vm-neutron-pg/23f03ba/logs/screen-n-cpu.txt.gz
http://logs.openstack.org/03/56303/2/check/check-tempest-devstack-vm-neutron-pg/23f03ba/logs/screen-n-cond.txt.gz

and search for request:

req-8c5d20d1-5191-4683-8fb8-25570e72bf97

Whether that's induced by a Neutron failure remains to be seen. I'll keep digging.

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2013-11-15:

I think I found why this loops:

1) compute manager calls _shutdown_instance: this in turn calls _get_instance_nw_info which calls net_api.get_instance_nw_info
2) network_api.get_instance_nw_info calls _get_instance_nw_info that calls _build_network_info_model and shortly after update_instance_info_cache
3) update_instance_info_cache is set to network_api.update_instance_cache_with_nw_info in neutronv2/api.y
4) update_instance_cache_with_nw_info in turn calls _get_instance_nw_info again if nw_info is not of type NetworkInfo

This is where I am lost because as far as I can tell, nw_info should be of that type.

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2013-11-15:

https://review.openstack.org/#/c/56504/

Changed in neutron:
assignee:	Armando Migliaccio (armando-migliaccio) → nobody
status:	Confirmed → New
Changed in nova:
importance:	Undecided → Medium
assignee:	nobody → Armando Migliaccio (armando-migliaccio)
status:	New → In Progress

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2013-11-15:

In the specific instance of the failure I examined Neutron complained correctly that one or more ports had an IP allocation from the subnet; this was because the user port associated to the server hadn't been deleted. Nova-compute was stuck in a loop trying to retrieve nw_info. I haven't been able to root-cause the issue, but I tried to address the infinite loop by removing what sounded to me an erroneous duplication of a refresh_cache action.

This in turn might cause the failure mode to change and hence reveal more about the underlying problem.

Changed in neutron:
status:	New → Incomplete

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-11-16: Fix merged to nova (master)

Reviewed: https://review.openstack.org/56504
Committed: http://github.com/openstack/nova/commit/651fac3d5d250d42e640c3ac113084bf0d2fa3b4
Submitter: Jenkins
Branch: master

commit 651fac3d5d250d42e640c3ac113084bf0d2fa3b4
Author: armando-migliaccio <email address hidden>
Date: Thu Nov 14 18:59:20 2013 -0800

Remove reduntant call to update_instance_info_cache

    get_instance_nw_info calls _get_instance_nw_info,
    which is decorated with @refresh_cache. This in
    turn calls update_instance_info_cache, again. This
    is both expensive and more importantly dangerous because
    the method update_instance_info_cache may call
    _get_instance_nw_info itself, thus causing an infinite
    loop.

Change-Id: Id8f0241d1c6609cf4e064ee6855228ae7de250e2
Partial-bug: 1235435

Revision history for this message

Matt Riedemann (mriedem) wrote on 2013-11-17:

Commit 1957339df302e2da75e0dbe78b5d566194ab2c08 added the decorator and it's not in stable/havana so we don't need to backport this.

Changed in nova:
status:	In Progress → Fix Committed

Revision history for this message

Anita Kuno (anteaya) wrote on 2013-11-18:

#10

http://logstash.openstack.org/#eyJmaWVsZHMiOiBbXSwgInNlYXJjaCI6ICJtZXNzYWdlOlwiT25lIG9yIG1vcmUgcG9ydHMgaGF2ZSBhbiBJUCBhbGxvY2F0aW9uIGZyb20gdGhpcyBzdWJuZXRcIiBBTkQgbWVzc2FnZTpcIiBTdWJuZXRJblVzZTogVW5hYmxlIHRvIGNvbXBsZXRlIG9wZXJhdGlvbiBvbiBzdWJuZXRcIiBBTkQgZmlsZW5hbWU6XCJsb2dzL3NjcmVlbi1xLXN2Yy50eHRcIlxuIiwgInRpbWVmcmFtZSI6ICI2MDQ4MDAiLCAiZ3JhcGhtb2RlIjogImNvdW50IiwgIm9mZnNldCI6IDB9

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-11-19: Fix proposed to nova (master)

#11

Fix proposed to branch: master
Review: https://review.openstack.org/57069

Thierry Carrez (ttx) on 2013-11-25

Changed in nova:
status:	Fix Committed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-11-26: Fix merged to nova (master)

#12

Reviewed: https://review.openstack.org/57069
Committed: http://github.com/openstack/nova/commit/2e520496c369a8ef646820ec068cc4416ab50eca
Submitter: Jenkins
Branch: master

commit 2e520496c369a8ef646820ec068cc4416ab50eca
Author: armando-migliaccio <email address hidden>
Date: Mon Nov 18 16:10:48 2013 -0800

Address infinite loop in nova compute when getting network info

    Move the refresh_cache decorator to get_instance_nw_info.
    In fact, _get_instance_nw_info is called by the decorator
    itself, potentially causing an infinite loop in case nw_info
    is not of the expected value. This is also inline with
    the method's purpose, as stated by the docstring. At the
    same time, ensure that delete_port_for_instance and
    allocate_for_instance continue to refresh the cache by
    using the decorated version.

Also, add a couple of debug traces that are friendly to
the developer.

Partial-bug: 1235435
Related-bug: 1251792

Change-Id: I06f8634ea241d05ac8fbcc290adf0cb23829f3e4

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-11-26: Fix proposed to nova (stable/havana)

#13

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/58471

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-11-26: Fix merged to nova (stable/havana)

#14

Reviewed: https://review.openstack.org/58471
Committed: http://github.com/openstack/nova/commit/ef9aec1cb6ec0d5fae699f9bb1f0d042530de5de
Submitter: Jenkins
Branch: stable/havana

commit ef9aec1cb6ec0d5fae699f9bb1f0d042530de5de
Author: armando-migliaccio <email address hidden>
Date: Mon Nov 18 16:10:48 2013 -0800

Address infinite loop in nova compute when getting network info

Also, add a couple of debug traces that are friendly to
the developer.

Partial-bug: 1235435
Related-bug: 1251792

Change-Id: I06f8634ea241d05ac8fbcc290adf0cb23829f3e4
(cherry picked from commit 2e520496c369a8ef646820ec068cc4416ab50eca)

tags:

added: in-stable-havana

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2013-11-28:

#15

This seems to have gone quiet since the fixes merged on the 26th. Marking committed for now.

Changed in neutron:
status:	Incomplete → Invalid
Changed in nova:
status:	In Progress → Fix Committed
Changed in tempest:
status:	New → Invalid

Revision history for this message

Anita Kuno (anteaya) wrote on 2013-12-01:

#16

There was one failure on November 29th, 2013 at 18:00:00.000 UTC which is an oddly exact timestamp for the failure.

Russell Bryant (russellb) on 2013-12-03

Changed in nova:
milestone:	none → icehouse-1

Thierry Carrez (ttx) on 2013-12-04

Changed in nova:
status:	Fix Committed → Fix Released

Alan Pevec (apevec) on 2013-12-08

tags:

removed: in-stable-havana

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-02-13: Fix proposed to nova (stable/havana)

#17

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/73421

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-03-06: Fix merged to nova (stable/havana)

#18

Reviewed: https://review.openstack.org/73421
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b047eaa9d96dc7c8f7a2f4311277090ce9d21d89
Submitter: Jenkins
Branch: stable/havana

commit b047eaa9d96dc7c8f7a2f4311277090ce9d21d89
Author: armando-migliaccio <email address hidden>
Date: Thu Nov 14 18:59:20 2013 -0800

Remove reduntant call to update_instance_info_cache

    Change-Id: Id8f0241d1c6609cf4e064ee6855228ae7de250e2
    Partial-bug: 1235435
    (cherry picked from commit 651fac3d5d250d42e640c3ac113084bf0d2fa3b4)

tags:

added: in-stable-havana

Alan Pevec (apevec) on 2014-04-08

tags:

removed: in-stable-havana

Thierry Carrez (ttx) on 2014-04-17

Changed in nova:
milestone:	icehouse-1 → 2014.1

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.