Comment 28 for bug 1611237

Revision history for this message
IWAMOTO Toshihiro (iwamoto) wrote :

If an agent were killed by SIGKILL, we wouldn't see this bug.

Correct shutdown sequence:

1. ovs-agent receives SIGTERM, _handle_sigterm is called a signal handler, setting the catch_sigterm flag.
2. Control exits from rpc_loop's while loop, causing main() to terminate.
3. app_manager.AppManager.get_instance().close is called to clean up all ryu threads.

The bug situation:

1. devstack sends SIGTERM to the process *group* of ovs-agent (note this happens twice).
2. The ovs-agent is respawning its helper processes, probably due to the above SIGTERM.
3. As you are very unlucky, a ps process to confirm process existence is killed, causing an ProcessExecutionError.
4. The thread running agent_main_wrapper is terminated due to the exception, without cleaning other ryu threads.
   Note: if of_interface=ovs-ofctl, an exception will terminate the agent.
5. As SIGTERM is handled by a signal handler, the ovs-agent fails to terminate.