Graceful termination is not correctly handled when using amphorav2/jobboard
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
octavia |
In Progress
|
Undecided
|
Unassigned |
Bug Description
When using amphorav2/jobboard, it is expected that after receiving a SIGTERM
signal, the taskflow condutor stops executing and suspends the current task
(after finishing the execute() function).
But currently, the task is killed and left into a non-SUSPENDING state.
DEBUG octavia.
DEBUG octavia.
pkill -TERM octavia-worker
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(1) [3995517]
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(0) [3995515]
INFO cotyledon.
INFO octavia.
INFO octavia.
DEBUG cotyledon.
DEBUG cotyledon.
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(1) [3995517]
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(0) [3995515]
INFO octavia.
WARNING amqp [-] Received method (60, 30) during closing channel 1. This method will be ignored
INFO octavia.
INFO octavia.
WARNING amqp [-] Received method (60, 30) during closing channel 1. This method will be ignored
INFO octavia.
DEBUG cotyledon.
The root cause of the issue is that a function that is being called doesn't exist and the exception is hidden:
Changed in octavia: | |
status: | New → In Progress |
After fixing the issue:
INFO cotyledon. _service_ manager [-] Caught SIGTERM signal, graceful exiting of master process controller. queue.v2. consumer [-] Stopping V2 consumer... controller. queue.v2. consumer [-] Stopping V2 consumer... _service_ manager [-] Killing services with signal SIGTERM {{(pid=3997427) _shutdown /usr/local/ lib/python3. 9/site- packages/ cotyledon/ _service_ manager. py:304} } _service_ manager [-] Waiting services to terminate {{(pid=3997427) _shutdown /usr/local/ lib/python3. 9/site- packages/ cotyledon/ _service_ manager. py:308} } controller. queue.v2. consumer [-] V2 Consumer successfully stopped. Waiting for final messages to be processed... controller. queue.v2. consumer [-] Shutting down V2 endpoint worker executors... controller. queue.v2. consumer [-] Shutting down conductor octavia- task-flow- conductor- 2425e1d0- 7589-4875- 9490-40ef350f6b 6e controller. queue.v2. consumer [-] V2 Consumer successfully stopped. Waiting for final messages to be processed... controller. queue.v2. consumer [-] Shutting down V2 endpoint worker executors... controller. queue.v2. consumer [-] Shutting down conductor octavia- task-flow- conductor- 6775e9d7- 6162-4eb5- 858b-c89afe5255 52 amphorae. drivers. haproxy. rest_api_ driver [-] Could not connect to instance. Retrying.: requests. exceptions. ConnectTimeout: HTTPSConnection Pool(host= '192.168. 0.117', port=9443): Max retries exceeded with url: // (Caused by ConnectTimeoutE rror(<urllib3. connection. HTTPSConnection object at 0x7f3012a2e8b0>, 'Connection to 192.168.0.117 timed out. (connect timeout=10.0)')) common. base_taskflow [-] Flow 'get_create_ load_balancer_ flow-6359dcb1- 4b37-4ff1- 86f0-1223f605fd 74' (6359dcb1- 4b37-4ff1- 86f0-1223f605fd 74) transitioned into state 'SUSPENDING' from state 'RUNNING' {{(pid=3997487) _flow_receiver /opt/stack/ taskflow/ taskflow/ listeners/ logging. py:141} } common. base_taskflow [-] Flow 'get_create_ load_balancer_ flow-6359dcb1- 4b37-4ff1- 86f0-1223f605fd 74' (6359dcb1- 4b37-4ff1- 86f0-1223f605fd 74) transitioned into state 'SUSPENDED' from state 'SUSPENDING' {{(pid=3997487) _flow_receiver /opt/stack/ taskflow/ taskflow/ listeners/ logging. py:141} } _service_ manager [-] Shutdown finish {{(pid=3997427) _shutdown /usr/local/ lib/python3. 9/site- packages/ cotyledon/ _service_ manager. py:320} }
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(1) [3997489]
INFO octavia.
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(0) [3997487]
INFO octavia.
DEBUG cotyledon.
DEBUG cotyledon.
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(1) [3997489]
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(0) [3997487]
INFO octavia.
WARNING amqp [-] Received method (60, 30) during closing channel 1. This method will be ignored
INFO octavia.
INFO octavia.
INFO octavia.
WARNING amqp [-] Received method (60, 30) during closing channel 1. This method will be ignored
INFO octavia.
INFO octavia.
WARNING octavia.
DEBUG octavia.
DEBUG octavia.
DEBUG cotyledon.