Comment 17 for bug 1644530

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

What I think is happening in our case:
Since no ExecStop= was specified, systemd will send SIGTERM [...]
Details: https://www.freedesktop.org/software/systemd/man/systemd.kill.html#

KillMode is "process" in the service file.
That means "If set to process, only the main process itself is killed."

So in this case it relies on that being forwarded to the child processes.
That takes time.
If not waiting for it to be "complete" the following restart will send the next SIGTERM and this eliminates the (already in cleanup) main proccess before it can distribute the TERM to its childs/siblings. This is our error state.

In this broken state
 Main PID: 10600 (code=exited, status=0/SUCCESS)
Our mode of KillMode=process might have special handling and kill all of them (since there is no main to kill). That is the cleanup, which gets it back to work again.

Since the service files in both (X/Z) cases are the same I wonder if there is a systemd change which fixes this by some sort of waiting for the signal to be handled (e.g. waiting for the MainPid to go away on its own).

Systemd versions:
Xenial: 229-4ubuntu16
Zesty: 232-18ubuntu1