AsyncProcess.stop() can lead to deadlock
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Medium
|
John Schwarz |
Bug Description
The bug occurs when calling stop() on an AsyncProcess instance which is running a progress generating substantial amounts of output to stdout/stderr and that has a signal handler for some signal (SIGTERM for example) that causes the program to exit gracefully.
Linux Pipes 101: when calling write() to some one-way pipe, if the pipe is full of data [1], write() will block until the other end read()s from the pipe.
AsyncProcess is using eventlet.
It is clear that if SIGTERM is sent to the subprocess, and if the subprocess is generating a lot of output to stdout/stderr AFTER the readers were killed, a deadlock is achieved: the parent process is blocking on wait() and the subprocess is blocking on write() (waiting for someone to read and empty the pipe).
This can be avoided by sending SIGKILL to the AsyncProcesses (this is the code's default), but other signals such as SIGTERM, that can be handled by the userspace code to cause the process to exit gracefully, might trigger this deadlock. For example, I ran into this while trying to modify existing fullstack tests to SIGTERM processes instead of SIGKILL them, and the ovs agent got deadlocked a lot.
[1]: http://
[2]: https:/
Changed in neutron: | |
assignee: | nobody → John Schwarz (jschwarz) |
Changed in neutron: | |
importance: | Undecided → Medium |
status: | New → Confirmed |
tags: | added: liberty-rc-potential |
tags: |
added: liberty-backport-potential removed: liberty-rc-potential |
Changed in neutron: | |
status: | Fix Committed → Fix Released |
tags: | removed: liberty-backport-potential |
Nice write up, I think that matches what you were seeing, and makes perfect sense.