Under some circumstances (amphorav2 - with or without jobboard, ACTIVE_STANDBY required, the bug is not 100% reproducible), the revert of a taskflow flow may be incomplete.
For instance, when adding a failure in a Task with a Retry (AmphoraComputeConnectivityWait)
diff --git a/octavia/controller/worker/v2/tasks/amphora_driver_tasks.py b/octavia/controller/worker/v2/tasks/amphora_driver_tasks.py
index 62d68051..84609805 100644
--- a/octavia/controller/worker/v2/tasks/amphora_driver_tasks.py
+++ b/octavia/controller/worker/v2/tasks/amphora_driver_tasks.py
@@ -675,6 +675,7 @@ class AmphoraComputeConnectivityWait(BaseAmphoraTask):
def execute(self, amphora, raise_retry_exception=False):
"""Execute get_info routine for an amphora until it responds."""
try:
+ raise driver_except.AmpConnectionRetry(exception="foo")
session = db_apis.get_session()
with session.begin():
db_amphora = self.amphora_repo.get(
and creating a load balancer, the subflow is reverted after the retries, but:
* the 2nd subflow (there are 2 subflows running concurrently when creating an A/S LB) is not reverted
* the "main" flow is not reverted
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-amp-compute-connectivity-wait' (5b98ef7b-8005-4ab2-8a77-7d8bda2729be) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None'
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-amp-compute-connectivity-wait' (5b98ef7b-8005-4ab2-8a77-7d8bda2729be) transitioned into state 'PENDING' from state 'REVERTED' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-amp-compute-connectivity-wait' (5b98ef7b-8005-4ab2-8a77-7d8bda2729be) transitioned into state 'RUNNING' from state 'PENDING' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-amp-compute-connectivity-wait' (5b98ef7b-8005-4ab2-8a77-7d8bda2729be) transitioned into state 'REVERTING' from state 'FAILURE' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-amp-compute-connectivity-wait' (5b98ef7b-8005-4ab2-8a77-7d8bda2729be) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None'
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-amp-compute-connectivity-wait' (5b98ef7b-8005-4ab2-8a77-7d8bda2729be) transitioned into state 'PENDING' from state 'REVERTED' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-amp-compute-connectivity-wait' (5b98ef7b-8005-4ab2-8a77-7d8bda2729be) transitioned into state 'RUNNING' from state 'PENDING' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-amp-compute-connectivity-wait' (5b98ef7b-8005-4ab2-8a77-7d8bda2729be) transitioned into state 'REVERTING' from state 'FAILURE' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-amp-compute-connectivity-wait' (5b98ef7b-8005-4ab2-8a77-7d8bda2729be) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None'
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-update-amphora-info' (b86b6c05-39d0-4e01-b28d-104f26963f14) transitioned into state 'REVERTING' from state 'SUCCESS' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-update-amphora-info' (b86b6c05-39d0-4e01-b28d-104f26963f14) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None'
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-compute-wait' (c61e6fab-c79d-4745-8467-73a16a062a3a) transitioned into state 'REVERTING' from state 'SUCCESS' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-compute-wait' (c61e6fab-c79d-4745-8467-73a16a062a3a) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None'
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-mark-amphora-booting-indb' (56e61bf1-0c58-4685-87a2-52abb660b6b1) transitioned into state 'REVERTING' from state 'SUCCESS' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.tasks.database_tasks [-] Reverting mark amphora booting in DB for amp id 5b377faa-01e0-4e02-b702-c6a12ab51a59 and compute id 5ba04bb8-a632-4c44-a03e-234ac87b5b16
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-mark-amphora-booting-indb' (56e61bf1-0c58-4685-87a2-52abb660b6b1) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None'
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-update-amphora-computeid' (95cd38b3-cea4-4ffe-a1db-677d471b1e8f) transitioned into state 'REVERTING' from state 'SUCCESS' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-update-amphora-computeid' (95cd38b3-cea4-4ffe-a1db-677d471b1e8f) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None'
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-cert-compute-create' (776b4bfb-cb88-403c-bbbe-1404989799a8) transitioned into state 'REVERTING' from state 'SUCCESS' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.tasks.compute_tasks [-] Reverting compute create for amphora with id 5b377faa-01e0-4e02-b702-c6a12ab51a59 and compute id: 5ba04bb8-a632-4c44-a03e-234ac87b5b16
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG novaclient.v2.client [-] REQ: curl -g -i --cacert "/opt/stack/data/ca-bundle.pem" -X DELETE http://192.168.1.101/compute/v2.1/servers/5ba04bb8-a632-4c44-a03e-234ac87b5b16 -H "Accept: application/json" -H "User-Agent: python-novaclient" -H "X-Auth-Token: {SHA256}df1b68dcf137685f869322dd195bd5e33031be72615bbf639ff8328b0debebc1" -H "X-OpenStack-Nova-API-Version: 2.15" {{(pid=3952342) _http_log_request /usr/local/lib/python3.9/site-packages/keystoneauth1/session.py:511}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG novaclient.v2.client [-] RESP: [204] Connection: close Content-Type: application/json Date: Fri, 10 Nov 2023 12:52:04 GMT OpenStack-API-Version: compute 2.15 Server: Apache/2.4.57 (CentOS Stream) OpenSSL/3.0.7 mod_wsgi/4.7.1 Python/3.9 Vary: OpenStack-API-Version,X-OpenStack-Nova-API-Version X-OpenStack-Nova-API-Version: 2.15 x-compute-request-id: req-44eb94ed-cbd3-461a-8cce-a7afb43fcbf3 x-openstack-request-id: req-44eb94ed-cbd3-461a-8cce-a7afb43fcbf3 {{(pid=3952342) _http_log_response /usr/local/lib/python3.9/site-packages/keystoneauth1/session.py:542}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG novaclient.v2.client [-] DELETE call to compute for http://192.168.1.101/compute/v2.1/servers/5ba04bb8-a632-4c44-a03e-234ac87b5b16 used request id req-44eb94ed-cbd3-461a-8cce-a7afb43fcbf3 {{(pid=3952342) request /usr/local/lib/python3.9/site-packages/keystoneauth1/session.py:946}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-cert-compute-create' (776b4bfb-cb88-403c-bbbe-1404989799a8) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None'
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-update-cert-expiration' (fb9de337-37f5-4789-a6a4-ac2898805542) transitioned into state 'REVERTING' from state 'SUCCESS' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-update-cert-expiration' (fb9de337-37f5-4789-a6a4-ac2898805542) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None'
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-generate-serverpem' (d4adb1f9-e4ef-4ac6-b9b2-7fa03401468e) transitioned into state 'REVERTING' from state 'SUCCESS' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-generate-serverpem' (d4adb1f9-e4ef-4ac6-b9b2-7fa03401468e) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None'
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-create-amphora-indb' (38d938ff-133c-4637-9b8b-61d14771f766) transitioned into state 'REVERTING' from state 'SUCCESS' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.tasks.database_tasks [-] Reverting create amphora in DB for amp id 5b377faa-01e0-4e02-b702-c6a12ab51a59
Nov 10 07:52:04 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.controller_worker [-] Task 'MASTER-octavia-create-amp-for-lb-subflow-octavia-create-amphora-indb' (38d938ff-133c-4637-9b8b-61d14771f766) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None'
Nov 10 07:52:05 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'BACKUP-octavia-create-amp-for-lb-subflow-octavia-compute-wait' (38a8143e-4816-4a16-ae05-5037003245ba) transitioned into state 'REVERTING' from state 'FAILURE' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:05 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.controller_worker [-] Task 'BACKUP-octavia-create-amp-for-lb-subflow-octavia-compute-wait' (38a8143e-4816-4a16-ae05-5037003245ba) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None'
Nov 10 07:52:05 gthiemon-devstack octavia-worker[3952342]: DEBUG octavia.controller.worker.v2.controller_worker [-] Task 'BACKUP-octavia-create-amp-for-lb-subflow-octavia-compute-wait' (38a8143e-4816-4a16-ae05-5037003245ba) transitioned into state 'PENDING' from state 'REVERTED' {{(pid=3952342) _task_receiver /opt/stack/taskflow/taskflow/listeners/logging.py:190}}
Nov 10 07:52:05 gthiemon-devstack octavia-worker[3952342]: WARNING octavia.controller.worker.v2.controller_worker [-] Flow 'octavia-create-loadbalancer-flow' (9c221fbb-d7c5-413b-9b17-3a6ca92e8d7a) transitioned into state 'REVERTED' from state 'RUNNING'
Nov 10 07:52:05 gthiemon-devstack octavia-worker[3952342]: ERROR oslo_messaging.rpc.server [-] Exception during message handling: taskflow.exceptions.WrappedFailure: WrappedFailure: [Failure: octavia.common.exceptions.ComputeWaitTimeoutException: Waiting for compute id 5ba04bb8-a632-4c44-a03e-234ac87b5b16 to go active timeout., Failure: octavia.amphorae.driver_exceptions.exceptions.AmpConnectionRetry: Could not connect to amphora, exception caught: foo, Failure: octavia.common.exceptions.ComputeWaitTimeoutException: Waiting for compute id b64f2a56-020f-4c73-99ca-5e427ea78f1c to go active timeout., Failure: octavia.common.exceptions.ComputeWaitTimeoutException: Waiting for compute id b64f2a56-020f-4c73-99ca-5e427ea78f1c to go active timeout.]
only the subflow for the MASTER amp is reverted, the subflow for BACKUP is interrupted (amphora not deleted), and none of the tasks of the main flow are reverted (LB stuck in PENDING_CREATE)
After some investigations in taskflow, it appears that using Retry in 2 unordered flows is not safe, when the Retry of the 1st flow fails, it marks all the tasks of the all flows as to be reverted but the 2nd Retry running in parallel may override this state and reschedule the execution of a task.
2 mitigations are possible:
- stop using Retries in octavia: using long-duration tasks is fine with amphorav2 and jobboard (it was a problem in old releases, but the issue is fixed)
- fix taskflow
full logs are provided in attachment.
Reproducer + potential fix in taskflow: https:/ /review. opendev. org/c/openstack /taskflow/ +/900746