Bad initialization sequence in Neutron agents (and maybe somewhere else)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Won't Fix
|
Medium
|
Unassigned |
Bug Description
TL;DR: running threads that modify global state (which can be any RPC calls, or any other library calls) in background while forking happens may cause weird hard-to-debug errors, so background jobs should be initialized after forking (or even better in separate process)
Currently at least metadata and l3 agents start background threads that periodically report state before forking and running main loop in child threads. Those threads can modify global state so that it is inconsistent in the time of forking which will lead to bad service behavior. In the ideal world main process shouldn't do anything except managing child processes to avoid any global state leakage into child processes (open fds, locked locks, etc).
This bug was uncovered during investigation into weird seemingly unrelated error in grenade job in Ann's CR that was changing to new oslo.db EngineFacade [0]. Symptoms were: SSHTimeout while instance's cloud-init is busy trying to get metadata and failing. In q-meta logs we noticed that there's no INFO messages about incoming HTTP requests, but there are some DEBUG ones, which means that requests are coming to the agent but are not being responded to. Digging deeper we noticed that in the normal operation metadata agent should:
- receive request;
- do RPC call to neutron-server;
- do HTTP request to Nova;
- send response.
There were no RPC "CALL" in logs, only CASTs from state reporting and the very first one CALL for state reporting.
Since it's hard to reproduce gate jobs, especially multinode ones, we've created another CR [1] that added tracing of every Python LOC to see what really happens. You can find the long log with all the tracing at [2] or in attachment (logs don't live forever). It uncovered the following chain of events:
- main thread in main process starts background thread for state reporting;
- some time later that thread starts reporting, wants to do the first CALL (it does CALL once and then it does CASTs);
- to get 'reply_q' (essentially, connection shared for replies IIUC), it acquires a lock;
- since there are no connections available at that time (it's the first time RPC is used), oslo_messaging starts connecting to RabbitMQ;
- background thread yield execution to main thread;
- main thread forks bunch of WSGI workers;
- in WSGI workers when requests come, handler tries to do CALL to neutron-server;
- to get 'reply_q' it tries to acquire a lock, but it is already has been "taken" by background thread in the main process;
- it hangs forever, which can be seen in Guru Meditation Report.
There are several problems here (including eventlet having not fork-aware locks), but the one that can be fixed to fix them all is to start such background threads after all forking happens. I've published CR [3] to verify that changing initialization order will fix this issue, and it did.
Note that from what I've been told, forks can still happen if child worker unexpectedly dies and main process reforks it. To properly fix this issue we should not do anything in the main process that can spoil global state that can influence child process. It means that we'll need to either run state reporting in a separate process or have an isolated oslo_messaging environment (context? IANA expert in oslo_messaging) for it.
[0] https:/
[1] https:/
[2] http://
[3] https:/
Changed in neutron: | |
assignee: | nobody → Yuriy Taraday (yorik-sar) |
status: | New → In Progress |
Changed in neutron: | |
assignee: | Yuriy Taraday (yorik-sar) → Ann Kamyshnikova (akamyshnikova) |
Changed in neutron: | |
assignee: | Ann Kamyshnikova (akamyshnikova) → Yuriy Taraday (yorik-sar) |
tags: | added: needs-attention |
Changed in neutron: | |
importance: | Undecided → Medium |
Changed in neutron: | |
assignee: | Yuriy Taraday (yorik-sar) → nobody |
status: | In Progress → Confirmed |
tags: | added: neutron-proactive-backport-potential |
Changed in neutron: | |
assignee: | nobody → Sindhu Devale (sindhu-devale-3) |
Changed in neutron: | |
assignee: | Sindhu Devale (sindhu-devale-3) → nobody |
tags: | removed: neutron-proactive-backport-potential |
tags: | removed: needs-attention |
Reviewed: https:/ /review. openstack. org/331672 /git.openstack. org/cgit/ openstack/ neutron/ commit/ ?id=7eeeb99a553 85b5fc3d4ce1a3b 390e41be252b17
Committed: https:/
Submitter: Jenkins
Branch: master
commit 7eeeb99a55385b5 fc3d4ce1a3b390e 41be252b17
Author: Yuriy Taraday <email address hidden>
Date: Mon Jun 20 16:39:03 2016 +0300
Move state reporting initialization to after worker forking
We start state reporting thread before forking child processes. This
leads to possibility of fork in the middle of reporting process. This
can cause child processes to start with corrupted global state. In [0]
it represented itself as metadata agent not being able to make RPC call
because of lock acquired in main process before fork and released after
fork. See bug description for details.
[0] https:/ /review. openstack. org/312393
Partial-Bug: 1594439
Change-Id: Id5079a296bc402 c6f5b6cdb2df728 11eab5bc6ed