Graceful termination is not correctly handled when using amphorav2/jobboard

Bug #2064101 reported by Gregory Thiemonge
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
octavia
In Progress
Undecided
Unassigned

Bug Description

When using amphorav2/jobboard, it is expected that after receiving a SIGTERM
signal, the taskflow condutor stops executing and suspends the current task
(after finishing the execute() function).
But currently, the task is killed and left into a non-SUSPENDING state.

DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] request url / {{(pid=3995517) request /opt/stack/octavia/octavia/amphorae/drivers/haproxy/rest_api_driver.py:679}}
DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] request url https://192.168.0.117:9443// {{(pid=3995517) request /opt/stack/octavia/octavia/amphorae/drivers/haproxy/rest_api_driver.py:682}}

pkill -TERM octavia-worker

INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(1) [3995517]
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(0) [3995515]
INFO cotyledon._service_manager [-] Caught SIGTERM signal, graceful exiting of master process
INFO octavia.controller.queue.v2.consumer [-] Stopping V2 consumer...
INFO octavia.controller.queue.v2.consumer [-] Stopping V2 consumer...
DEBUG cotyledon._service_manager [-] Killing services with signal SIGTERM {{(pid=3995502) _shutdown /usr/local/lib/python3.9/site-packages/cotyledon/_service_manager.py:304}}
DEBUG cotyledon._service_manager [-] Waiting services to terminate {{(pid=3995502) _shutdown /usr/local/lib/python3.9/site-packages/cotyledon/_service_manager.py:308}}
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(1) [3995517]
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(0) [3995515]
INFO octavia.controller.queue.v2.consumer [-] V2 Consumer successfully stopped. Waiting for final messages to be processed...
WARNING amqp [-] Received method (60, 30) during closing channel 1. This method will be ignored
INFO octavia.controller.queue.v2.consumer [-] Shutting down V2 endpoint worker executors...
INFO octavia.controller.queue.v2.consumer [-] V2 Consumer successfully stopped. Waiting for final messages to be processed...
WARNING amqp [-] Received method (60, 30) during closing channel 1. This method will be ignored
INFO octavia.controller.queue.v2.consumer [-] Shutting down V2 endpoint worker executors...
DEBUG cotyledon._service_manager [-] Shutdown finish {{(pid=3995502) _shutdown /usr/local/lib/python3.9/site-packages/cotyledon/_service_manager.py:320}}

The root cause of the issue is that a function that is being called doesn't exist and the exception is hidden:

https://opendev.org/openstack/octavia/src/commit/bb7c8ca2c9b0db7e65fdaecce05c694929299054/octavia/controller/queue/v2/consumer.py#L70-L73

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

After fixing the issue:

INFO cotyledon._service_manager [-] Caught SIGTERM signal, graceful exiting of master process
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(1) [3997489]
INFO octavia.controller.queue.v2.consumer [-] Stopping V2 consumer...
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(0) [3997487]
INFO octavia.controller.queue.v2.consumer [-] Stopping V2 consumer...
DEBUG cotyledon._service_manager [-] Killing services with signal SIGTERM {{(pid=3997427) _shutdown /usr/local/lib/python3.9/site-packages/cotyledon/_service_manager.py:304}}
DEBUG cotyledon._service_manager [-] Waiting services to terminate {{(pid=3997427) _shutdown /usr/local/lib/python3.9/site-packages/cotyledon/_service_manager.py:308}}
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(1) [3997489]
INFO cotyledon._service [-] Caught SIGTERM signal, graceful exiting of service ConsumerService(0) [3997487]
INFO octavia.controller.queue.v2.consumer [-] V2 Consumer successfully stopped. Waiting for final messages to be processed...
WARNING amqp [-] Received method (60, 30) during closing channel 1. This method will be ignored
INFO octavia.controller.queue.v2.consumer [-] Shutting down V2 endpoint worker executors...
INFO octavia.controller.queue.v2.consumer [-] Shutting down conductor octavia-task-flow-conductor-2425e1d0-7589-4875-9490-40ef350f6b6e
INFO octavia.controller.queue.v2.consumer [-] V2 Consumer successfully stopped. Waiting for final messages to be processed...
WARNING amqp [-] Received method (60, 30) during closing channel 1. This method will be ignored
INFO octavia.controller.queue.v2.consumer [-] Shutting down V2 endpoint worker executors...
INFO octavia.controller.queue.v2.consumer [-] Shutting down conductor octavia-task-flow-conductor-6775e9d7-6162-4eb5-858b-c89afe525552
WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='192.168.0.117', port=9443): Max retries exceeded with url: // (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f3012a2e8b0>, 'Connection to 192.168.0.117 timed out. (connect timeout=10.0)'))
DEBUG octavia.common.base_taskflow [-] Flow 'get_create_load_balancer_flow-6359dcb1-4b37-4ff1-86f0-1223f605fd74' (6359dcb1-4b37-4ff1-86f0-1223f605fd74) transitioned into state 'SUSPENDING' from state 'RUNNING' {{(pid=3997487) _flow_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:141}}
DEBUG octavia.common.base_taskflow [-] Flow 'get_create_load_balancer_flow-6359dcb1-4b37-4ff1-86f0-1223f605fd74' (6359dcb1-4b37-4ff1-86f0-1223f605fd74) transitioned into state 'SUSPENDED' from state 'SUSPENDING' {{(pid=3997487) _flow_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:141}}
DEBUG cotyledon._service_manager [-] Shutdown finish {{(pid=3997427) _shutdown /usr/local/lib/python3.9/site-packages/cotyledon/_service_manager.py:320}}

Changed in octavia:
status: New → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.