Neutron-server + uwsgi deadlocks whenr unning rpc workers
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Medium
|
Unassigned |
Bug Description
In certain situations we observe that neutron-server + uwsgi shares locks between its native threads and its eventlet threads. As eventlet relies on being informed when a lock is released, this may lead to a deadlock, as the evenlet thread waits indefinitely for an already released lock. In our infrastructure this leads to API requests being performed on Neutron side, but the caller never gets a response. On actions like port creations from e.g. Nova or Manila this will lead to orphaned ports, as the implementation will just try again with creating the port.
To better debug this we have reintroduced guru meditation reports into neutron-server[0] and configured uwsgi to send a SIGWINCH on a harakiri[1] to trigger the guru meditation whenever a uwsgi worker deadlocks.
The two most interesting candidates seem to be a shared lock inside oslo_messaging and python's logging lock, which seems to also be called from oslo_messaging. Both cases identified by the traceback seem to point to oslo_messaging and its RPC Server (see attached guru meditation).
As all RPC Servers should run inside neutron-rpc-server anyway (due to the uwsgi/neutron-
>>> [ep for mhs in fo(oslo_
[<neutron.
The RPC servers should be started via start_rpc_
Nova has had similar problems with eventlet and logging in the past, see here[2][3]. Tests done with Neutron Yoga (or our own brand stable/yoga-m3), but issue is present in current master.
[0] https:/
[1] https:/
[2] https:/
[3] https:/
tags: | added: oslo |
Changed in neutron: | |
importance: | Undecided → High |
importance: | High → Medium |
Fix proposed to branch: master /review. opendev. org/c/openstack /neutron/ +/916112
Review: https:/