ResourceClosedError occurs when neutron API run in parallel

Bug #1381536 reported by Ian Ong on 2014-10-15
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
neutron
High
Neil Jerram

Bug Description

When DHCP agent creates port from neutron and another create port request is received, ResourceClosedError occurs in sqlalchemy.

this may be related to bug #1282922 https://bugs.launchpad.net/bugs/1282922
Above bug is related to nec plugin and it is mentioned that other plugins may be affected.
This error occurred in ML2 plugin both for create and delete ports.
Tested using 2014.3 Icehouse

==========================================================================
2014-10-15 21:58:59.837 26167 INFO neutron.wsgi [-] (26167) accepted ('172.16.2.86', 47007)

2014-10-15 21:58:59.870 26167 INFO neutron.wsgi [req-424a01ca-f52b-43a6-8844-d0d3590feb8d None] 172.16.2.86 - - [15/Oct/2014 21:58:59] "GET /v2.0/networks.json?fields=id&name=testnw2 HTTP/1.1" 200 251 0.031936

2014-10-15 21:58:59.872 26167 INFO neutron.wsgi [req-424a01ca-f52b-43a6-8844-d0d3590feb8d None] (26167) accepted ('172.16.2.86', 47008)

2014-10-15 21:58:59.950 26167 INFO neutron.wsgi [req-7ee742ef-6370-46b3-8f8b-f46ae5d262bc None] 172.16.2.86 - - [15/Oct/2014 21:58:59] "POST /v2.0/subnets.json HTTP/1.1" 201 572 0.076879

2014-10-15 21:59:00.074 26167 INFO neutron.wsgi [req-a6ef6c65-811f-40d8-9443-b9590809994a None] (26167) accepted ('172.16.2.86', 47010)

2014-10-15 21:59:00.088 26167 INFO urllib3.connectionpool [-] Starting new HTTPS connection (1): 10.68.42.86
2014-10-15 21:59:00.111 26167 INFO neutron.wsgi [req-22a84d34-f454-423d-bb7b-b4c7e2e6e08c None] 172.16.2.86 - - [15/Oct/2014 21:59:00] "GET /v2.0/networks.json?fields=id&name=testnw2 HTTP/1.1" 200 251 0.033298

2014-10-15 21:59:00.113 26167 INFO neutron.wsgi [req-22a84d34-f454-423d-bb7b-b4c7e2e6e08c None] (26167) accepted ('172.16.2.86', 47012)

2014-10-15 21:59:51.165 26167 ERROR neutron.api.v2.resource [-] create failed
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource Traceback (most recent call last):
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/api/v2/resource.py", line 87, in resource
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource result = method(request=request, **args)
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/api/v2/base.py", line 448, in create
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource obj = obj_creator(request.context, **kwargs)
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/plugin.py", line 632, in create_port
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource result = super(Ml2Plugin, self).create_port(context, port)
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/db/db_base_plugin_v2.py", line 1371, in create_port
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource ips = self._allocate_ips_for_port(context, network, port)
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/db/db_base_plugin_v2.py", line 678, in _allocate_ips_for_port
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource result = NeutronDbPluginV2._generate_ip(context, subnets)
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/db/db_base_plugin_v2.py", line 359, in _generate_ip
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource return NeutronDbPluginV2._try_generate_ip(context, subnets)
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/db/db_base_plugin_v2.py", line 376, in _try_generate_ip
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource range = range_qry.filter_by(subnet_id=subnet['id']).first()
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2282, in first
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource ret = list(self[0:1])
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2149, in __getitem__
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource return list(res)
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/loading.py", line 65, in instances
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource fetch = cursor.fetchall()
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/result.py", line 752, in fetchall
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource self.cursor, self.context)
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1027, in _handle_dbapi_exception
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource util.reraise(*exc_info)
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/result.py", line 746, in fetchall
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource l = self.process_rows(self._fetchall_impl())
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/result.py", line 715, in _fetchall_impl
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource self._non_result()
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/result.py", line 720, in _non_result
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource "This result object does not return rows. "
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource ResourceClosedError: This result object does not return rows. It has been closed automatically.
2014-10-15 21:59:51.165 26167 TRACE neutron.api.v2.resource
2014-10-15 21:59:51.167 26167 INFO neutron.wsgi [-] 172.16.2.86 - - [15/Oct/2014 21:59:51] "POST /v2.0/ports.json HTTP/1.1" 500 296 51.052950
==========================================================================

Ian Ong (ongmichaelian) on 2014-10-15
description: updated
description: updated
description: updated
description: updated
Eugene Nikanorov (enikanorov) wrote :

Ian, can you please provide log output with debug logging turned on?

Changed in neutron:
assignee: nobody → Eugene Nikanorov (enikanorov)
importance: Undecided → Critical
importance: Critical → High
Pete Revales (peterevales) wrote :

In behalf of Ian, please refer to the attached file for the logs related to this issue(bug #1381536).

Shiv Haris (shh) on 2014-10-22
Changed in neutron:
milestone: none → kilo-1
Pete Revales (peterevales) wrote :

Eugene and Shiv, currently we are only supporting for Juno Version.
If possible, can I request that this bug will be released for Juno Milestone
or a patch for Juno Milestone. Thanks.

Pete Revales (peterevales) wrote :

> If possible, can I request that this bug will be released for Juno Milestone
> or a patch for Juno Milestone. Thanks.

I mean "Icehouse" Milestone not "Juno" Milestone. I'm so sorry.

Wei Wang (damon-devops) wrote :

Hi, I have faced the same exception, see my log at paste: http://paste.openstack.org/show/134640/.

Wei Wang (damon-devops) wrote :

This is a more detailed log:
http://paste.openstack.org/show/134718/

summary: - ResourceClosedError occurs when create_port from DHCP agent and
- create_port from neutron API run in parallel
+ ResourceClosedError occurs when neutron API run in parallel
Wei Wang (damon-devops) wrote :

I changed the bug's title since my problem happens both on network delete and port create.

Changed in neutron:
status: New → Confirmed
Kyle Mestery (mestery) on 2014-12-16
Changed in neutron:
milestone: kilo-1 → kilo-2
Fawad Khaliq (fawadkhaliq) wrote :
Download full text (31.0 KiB)

I see this when create_port is run in parallel on Icehouse.

2014-12-17 09:25:21.777 26327 ERROR neutron.api.v2.resource [req-f163dac0-8f04-4692-845f-cf151b45e6a2 None] create failed
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource Traceback (most recent call last):
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/api/v2/resource.py", line 87, in resource
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource result = method(request=request, **args)
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/api/v2/base.py", line 448, in create
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource obj = obj_creator(request.context, **kwargs)
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/plugins/plumgrid/plumgrid_plugin/plumgrid_plugin.py", line 199, in create_port
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource port)
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/db/db_base_plugin_v2.py", line 1371, in create_port
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource ips = self._allocate_ips_for_port(context, network, port)
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/db/db_base_plugin_v2.py", line 678, in _allocate_ips_for_port
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource result = NeutronDbPluginV2._generate_ip(context, subnets)
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/db/db_base_plugin_v2.py", line 359, in _generate_ip
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource return NeutronDbPluginV2._try_generate_ip(context, subnets)
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource File "/usr/lib/python2.7/site-packages/neutron/db/db_base_plugin_v2.py", line 376, in _try_generate_ip
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource range = range_qry.filter_by(subnet_id=subnet['id']).first()
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2341, in first
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource ret = list(self[0:1])
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2208, in __getitem__
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource return list(res)
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/loading.py", line 66, in instances
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource fetch = cursor.fetchall()
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/result.py", line 787, in fetchall
2014-12-17 09:25:21.777 26327 TRACE neutron.api.v2.resource self.cursor, self.context)
2014-12-17 09:25:21.777 26327 TRACE neutr...

tags: added: juno-backport-potential
tags: added: icehouse-backport-potential
Itsuro Oda (oda-g) wrote :

note-2:
mechanism driver (ml2_dpdk???) seems to do communication in precommit. So it is a problem of mechanism driver's side.

note-5,6:
cannot be analized for lack of information.

note-8:
create_port of plumgrid plugin seems to do communication with holding transaction. So it is a problem of plumgrid plugin.

Fawad Khaliq (fawadkhaliq) wrote :

AFAIK communication with holding transaction seems like a perfectly valid use case. Moving it out will only be working around the issue. Until the actual problem is triaged and fixed, PLUMgrid plugin is locking(temporary) to avoid this race.

Kyle Mestery (mestery) on 2015-02-03
Changed in neutron:
milestone: kilo-2 → kilo-3
Kyle Mestery (mestery) on 2015-03-19
Changed in neutron:
milestone: kilo-3 → kilo-rc1
Kyle Mestery (mestery) on 2015-03-30
Changed in neutron:
assignee: Eugene Nikanorov (enikanorov) → nobody
Neil Jerram (neil-jerram) wrote :

I also saw a problem like this when developing my project's mechanism driver. Please see the commit here, which fixed the problem: https://github.com/Metaswitch/calico-neutron/commit/7081cd08fac62519fd.

The commit message has a lot of detail about what I thought the problem was, so please take a look at that.

Neil Jerram (neil-jerram) wrote :

Is there any overall information available about what this mechanism driver (named 'mech_driver') is doing?

Is it a coincidence that the DHCP agent makes an RPC call to create the DHCP port at the same time as the create_port_postcommit is failing? Or is it something like:
- mech_driver is asked to create a port
- as part of that, it sends some signal to the compute host, which causes the DHCP agent there to try to create the DHCP port
- an SSL error occurs in the communication between the mech_driver and the compute host, causing the mech_driver create_port operation to fail
- while that failure is being handled, the create_port comes in from the DHCP agent, leading to the ResourceClosedError.

Is the mech_driver passing the DB context to the compute host, and the compute host's DHCP agent using that same context when asking to create the DHCP port? (Is that even possible? I'm afraid I don't yet understand DB contexts/sessions well enough to know.)

Kyle Mestery (mestery) wrote :

Neil, I think what you've done for Calico makes a lot of sense! I'm assigning this bug to you, if you feel like you have time to take a shot at fixing this, please do! Feel free to ping me or enikanorov in-channel (#openstack-neutron) with questions!

Changed in neutron:
assignee: nobody → Neil Jerram (neil-jerram)
Neil Jerram (neil-jerram) wrote :

Thanks Kyle; I'm happy to try to progress and resolve this in time for the Kilo RC.

Neil Jerram (neil-jerram) wrote :

I'm not yet clear, though, on the scope. Do we think this bug is specific to a particular mechanism driver? If so, where can I see the source code for that driver? There isn't a driver named 'mech_driver' in the current Neutron tip, AFAICS.

Eugene Nikanorov (enikanorov) wrote :

Neil, neutron-server and DHCP agent doesn't use same context because they are essentially different processes.

I think this might be an issue with some racy deletion of resources, however I'm not sure about port creation.

Neil Jerram (neil-jerram) wrote :

Hi again all.

OK, I need to know ASAP who thinks they're affected by this bug, and to see their mechanism driver code, please.

My current understanding is that this problem only occurs with out-of-tree mechanism driver code - in case I've got that wrong, please do let me know. On the other hand, if I'm right, surely this is _not_ a release-critical bug for Neutron for Kilo?

Kyle Mestery (mestery) wrote :

Neil, I think you're right in that this is not a release critical bug for Kilo. I'm moving the priority down to High and moving this out of Kilo for now. I hope Eugene can provide you some more answers on the context of this. But since it does appear to affect the NEC plugin according to the report, we won't hold the release for this.

Changed in neutron:
milestone: kilo-rc1 → liberty-1
Neil Jerram (neil-jerram) wrote :

Discussed this just now with enikanorov on IRC, and we believe that the underlying cause of this bug has probably been fixed by some other change since the original report.

Hence setting this back to Incomplete. If anyone does see this again on up to date code, please add details here, including logs and the involved mechanism driver code, and I'll be happy to take a further look.

Changed in neutron:
status: Confirmed → Incomplete
Thierry Carrez (ttx) on 2015-06-23
Changed in neutron:
milestone: liberty-1 → liberty-2
Kyle Mestery (mestery) wrote :

Marking Invalid per comment #19.

Changed in neutron:
status: Incomplete → Invalid
milestone: liberty-2 → none
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments