n-api and n-cpu receive timeouts from q-svc because of "Lock Wait timeout"

Bug #1372666 reported by Jeremy Stanley
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
neutron
Invalid
Medium
Unassigned

Bug Description

This request failed:

http://logs.openstack.org/12/123112/1/check/check-tempest-dsvm-neutron-full/cdb7110/logs/screen-n-api.txt.gz#_2014-09-22_14_16_01_028

2014-09-22 14:16:01.028 DEBUG nova.api.openstack.wsgi [req-bb64d882-d91e-4bff-9407-19277208e277 TestSecurityGroupsBasicOps-454747816 TestSecurityGroupsBasicOps-1777134551] Calling method '<bound method Controller.show of <nova.api.openstack.compute.servers.Controller object at 0x7f05ee9db610>>' _process_stack /opt/stack/new/nova/nova/api/openstack/wsgi.py:935
2014-09-22 14:16:01.063 DEBUG neutronclient.client [req-bb64d882-d91e-4bff-9407-19277208e277 TestSecurityGroupsBasicOps-454747816 TestSecurityGroupsBasicOps-1777134551]
REQ: curl -i http://127.0.0.1:9696/v2.0/ports.json?device_id=40737ad4-4513-4027-b031-cf7cf519d5b5 -X GET -H "X-Auth-Token: 916a5769e0ba42339f45c3f6bb00f147" -H "User-Agent: python-neutronclient"
 http_log_req /opt/stack/new/python-neutronclient/neutronclient/common/utils.py:140
2014-09-22 14:16:31.065 DEBUG neutronclient.client [req-bb64d882-d91e-4bff-9407-19277208e277 TestSecurityGroupsBasicOps-454747816 TestSecurityGroupsBasicOps-1777134551] throwing ConnectionFailed : HTTPConnectionPool(host='127.0.0.1', port=9696): Read timed out. (read timeout=30) _cs_request /opt/stack/new/python-neutronclient/neutronclient/client.py:132
2014-09-22 14:16:48.360 ERROR nova.api.openstack [req-bb64d882-d91e-4bff-9407-19277208e277 TestSecurityGroupsBasicOps-454747816 TestSecurityGroupsBasicOps-1777134551] Caught error: Connection to neutron failed: HTTPConnectionPool(host='127.0.0.1', port=9696): Read timed out. (read timeout=30)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack Traceback (most recent call last):
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/__init__.py", line 124, in __call__
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack return req.get_response(self.application)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1320, in send
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack application, catch_exc_info=False)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/request.py", line 1284, in call_application
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack app_iter = application(self.environ, start_response)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack return resp(environ, start_response)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/usr/local/lib/python2.7/dist-packages/keystonemiddleware/auth_token.py", line 646, in __call__
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack return self._call_app(env, start_response)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/usr/local/lib/python2.7/dist-packages/keystonemiddleware/auth_token.py", line 624, in _call_app
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack return self._app(env, _fake_start_response)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack return resp(environ, start_response)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack return resp(environ, start_response)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/usr/local/lib/python2.7/dist-packages/routes/middleware.py", line 131, in __call__
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack response = self.app(environ, start_response)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 144, in __call__
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack return resp(environ, start_response)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 130, in __call__
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack resp = self.call_func(req, *args, **self.kwargs)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/usr/lib/python2.7/dist-packages/webob/dec.py", line 195, in call_func
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack return self.func(req, *args, **kwargs)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/wsgi.py", line 908, in __call__
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack content_type, body, accept)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/wsgi.py", line 999, in _process_stack
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack request, action_args)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/wsgi.py", line 878, in post_process_extensions
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack **action_args)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/compute/contrib/security_groups.py", line 597, in show
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack return self._show(req, resp_obj)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/compute/contrib/security_groups.py", line 593, in _show
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack self._extend_servers(req, [resp_obj.obj['server']])
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/nova/nova/api/openstack/compute/contrib/security_groups.py", line 557, in _extend_servers
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack servers))
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/nova/nova/network/security_group/neutron_driver.py", line 346, in get_instances_security_groups_bindings
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack ports = self._get_ports_from_server_list(servers, neutron)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/nova/nova/network/security_group/neutron_driver.py", line 305, in _get_ports_from_server_list
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack ports.extend(neutron.list_ports(**search_opts).get('ports'))
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 98, in with_params
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack ret = self.function(instance, *args, **kwargs)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 312, in list_ports
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack **_params)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 1334, in list
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack for r in self._pagination(collection, path, **params):
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 1347, in _pagination
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack res = self.get(path, params=params)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 1320, in get
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack headers=headers, params=params)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 1297, in retry_request
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack headers=headers, params=params)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 1240, in do_request
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack content_type=self.content_type())
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/python-neutronclient/neutronclient/client.py", line 180, in do_request
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack **kwargs)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack File "/opt/stack/new/python-neutronclient/neutronclient/client.py", line 133, in _cs_request
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack raise exceptions.ConnectionFailed(reason=e)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack ConnectionFailed: Connection to neutron failed: HTTPConnectionPool(host='127.0.0.1', port=9696): Read timed out. (read timeout=30)
2014-09-22 14:16:48.360 30109 TRACE nova.api.openstack
2014-09-22 14:16:48.385 INFO nova.api.openstack [req-bb64d882-d91e-4bff-9407-19277208e277 TestSecurityGroupsBasicOps-454747816 TestSecurityGroupsBasicOps-1777134551] http://127.0.0.1:8774/v2/50c4cb69aa66452ea00f6823976bef6d/servers/40737ad4-4513-4027-b031-cf7cf519d5b5 returned with HTTP 500
2014-09-22 14:16:48.386 DEBUG nova.api.openstack.wsgi [req-bb64d882-d91e-4bff-9407-19277208e277 TestSecurityGroupsBasicOps-454747816 TestSecurityGroupsBasicOps-1777134551] Returning 500 to user: The server has either erred or is incapable of performing the requested operation. __call__ /opt/stack/new/nova/nova/api/openstack/wsgi.py:1199
2014-09-22 14:16:48.389 INFO nova.osapi_compute.wsgi.server [req-bb64d882-d91e-4bff-9407-19277208e277 TestSecurityGroupsBasicOps-454747816 TestSecurityGroupsBasicOps-1777134551] 127.0.0.1 "GET /v2/50c4cb69aa66452ea00f6823976bef6d/servers/40737ad4-4513-4027-b031-cf7cf519d5b5 HTTP/1.1" status: 500 len: 354 time: 47.3637452

Comparing to the q-svc debug log from the same time, bafflingly no requests were logged failed or otherwise between 14:16:00.341 and 14:16:48.397.

Revision history for this message
Jeremy Stanley (fungi) wrote :

Added Nova, Neutron and the Neutron Client (used by Nova's Neutron driver) as the interaction is subtle and more eyes might help point out what I'm missing.

Revision history for this message
Jeremy Stanley (fungi) wrote :
Revision history for this message
Sean Dague (sdague) wrote :

Yeh, it looks like neutron is timing out the connection (possibly some other locking?), the request is eventually handled later, but it's too late.

http://logs.openstack.org/12/123112/1/check/check-tempest-dsvm-neutron-full/cdb7110/logs/screen-q-svc.txt.gz#_2014-09-22_14_16_48_510

Changed in nova:
status: New → Incomplete
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Indeed there is a gap of 48 seconds after the last "Got semaphore "db-access" lock"

Changed in neutron:
importance: Undecided → High
no longer affects: python-neutronclient
Changed in neutron:
assignee: nobody → Numan Siddique (numansiddique)
Changed in neutron:
assignee: Numan Siddique (numansiddique) → nobody
Revision history for this message
Itsuro Oda (oda-g) wrote :

> Indeed there is a gap of 48 seconds after the last "Got semaphore "db-access" lock"

It is sure that execution of update_device_down takes 48 sec by looking q-agt log in addition to q-svc log.
It seems there is no concurrent operation executed in neutron-server.
So I guess that there is an external factor to block neutron-server process.
And I found the following:
http://logs.openstack.org/12/123112/1/check/check-tempest-dsvm-neutron-full/cdb7110/logs/syslog.txt.gz#_Sep_22_14_16_00

Iscsi device was failed just same period (16:00-16:48).
I think this is cause of this problem.
(eg. writing to DB was blocked at that period)

dql (duquanglong)
Changed in nova:
status: Incomplete → New
Revision history for this message
Sean Dague (sdague) wrote :

@dql why did you change this to a New state? What makes you think the incomplete nature of the nova bug was resolved?

Changed in nova:
status: New → Invalid
Revision history for this message
Sean Dague (sdague) wrote :

I believe this is a neutron timeout issue, marking as Invalid

Revision history for this message
Kyle Mestery (mestery) wrote :

Moving to Liberty-1 now in the hopes of raising the visibility of this.

Changed in neutron:
milestone: none → liberty-1
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

Changed bug title to reflect what it is in reality.
Between n-api and n-cpu there are now only 32 hits in a week, 80% failure rate.

It is nowhere near the 310 hits per week noticed back in september so I'm decreasing the priority to medium

summary: - list_ports().get() times out waiting for response from Neutron API in
- TestSecurityGroupsBasicOps
+ n-api and n-cpu receive timeouts from q-svc because of "Lock Wait
+ timeout"
Changed in neutron:
importance: High → Medium
Thierry Carrez (ttx)
Changed in neutron:
milestone: liberty-1 → liberty-2
Thierry Carrez (ttx)
Changed in neutron:
milestone: liberty-2 → liberty-3
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :
Changed in neutron:
status: New → Invalid
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

No longer valid, Juno is about to go EOL and this doesn't show up in master.

Changed in neutron:
milestone: liberty-3 → none
Revision history for this message
David Comay (comay) wrote :

Although Juno is going to EOL, are there plans for addressing this within Kilo? Or are plugins expected to work-around it?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.