We may also hit this bug (it may be the same, but not 100% sure, the logs [1] look very similar).
For mine, it seems two amphoras created for the problematic LB, the backup amphora reverted back after the failure but the master one is not reverted.
For yours, only the subflow for the MASTER amp is reverted, the subflow for BACKUP is interrupted (amphora not deleted), and none of the tasks of the main flow are reverted (LB stuck in PENDING_CREATE).
and our env is yoga release, which not include retry feature [2]. So theoretically, it doesn't seem to have the problem described by this bug [3] (Fix REVERT_ALL with Retries in unordered Flows).
More importantly, it seems that the flow the logs[1] mentioned is called octavia-create-amphora-flow, it doesn't have unordered flows, so it shouldn't have the problem of unordered flow either.
octavia-create-amphora-flow - linear_flow
- task - CreateAmphoraInDB
- ... other tasks ...
- ComputeActiveWait - may throw ComputeWaitTimeoutException
- octavia-create-amphora-retry-subflow - linear_flow
- task AmphoraComputeConnectivityWait - catch TimeOutException then run raise
- retry - AmpRetry - it returns retry.REVERT_ALL after connection_max_retries. it returns retry.RETRY before connection_max_retries and AmpConnectionRetry
It seems only the flow get_failover_amphora_flow includes unordered flows, it's used to perform failover for amphora, but the logs [1] shows that it was not in this stage, but in the stage of creating the amphora VM.
octavia-failover-amphora-flow - linear_flow
- task AmphoraToErrorOnRevertTask - used to revert LB to provisioning_status ERROR if this flow goes wrong
Based on the above reasons, do you think your following log (this log is same as my log as well) is really realated to rever operation? or it is failing because of an issue with the amphoras themselves? thanks.
Nov 10 07:52:05 gthiemon-devstack octavia-worker[3952342]: ERROR oslo_messaging.rpc.server [-] Exception during message handling: taskflow.exceptions.WrappedFailure: WrappedFailure: [Failure: octavia.common.exceptions.ComputeWaitTimeoutException: Waiting for compute id 5ba04bb8-a632-4c44-a03e-234ac87b5b16 to go active timeout., Failure: octavia.amphorae.driver_exceptions.exceptions.AmpConnectionRetry: Could not connect to amphora, exception caught: foo, Failure: octavia.common.exceptions.ComputeWaitTimeoutException: Waiting for compute id b64f2a56-020f-4c73-99ca-5e427ea78f1c to go active timeout., Failure: octavia.common.exceptions.ComputeWaitTimeoutException: Waiting for compute id b64f2a56-020f-4c73-99ca-5e427ea78f1c to go active timeout.]
Hi @gthiemonge ,
We may also hit this bug (it may be the same, but not 100% sure, the logs [1] look very similar).
For mine, it seems two amphoras created for the problematic LB, the backup amphora reverted back after the failure but the master one is not reverted.
For yours, only the subflow for the MASTER amp is reverted, the subflow for BACKUP is interrupted (amphora not deleted), and none of the tasks of the main flow are reverted (LB stuck in PENDING_CREATE).
and our env is yoga release, which not include retry feature [2]. So theoretically, it doesn't seem to have the problem described by this bug [3] (Fix REVERT_ALL with Retries in unordered Flows).
More importantly, it seems that the flow the logs[1] mentioned is called octavia- create- amphora- flow, it doesn't have unordered flows, so it shouldn't have the problem of unordered flow either.
octavia- create- amphora- flow - linear_flow outException create- amphora- retry-subflow - linear_flow onnectivityWait - catch TimeOutException then run raise max_retries. it returns retry.RETRY before connection_ max_retries and AmpConnectionRetry
- task - CreateAmphoraInDB
- ... other tasks ...
- ComputeActiveWait - may throw ComputeWaitTime
- octavia-
- task AmphoraComputeC
- retry - AmpRetry - it returns retry.REVERT_ALL after connection_
It seems only the flow get_failover_ amphora_ flow includes unordered flows, it's used to perform failover for amphora, but the logs [1] shows that it was not in this stage, but in the stage of creating the amphora VM.
main -> hm_health_check -> health_check -> failover_amphora -> get_failover_ amphora_ flow controller/ worker/ v2/flows/ amphora_ flows.py |grep -v import
update_ amps_subflow = unordered_ flow.Flow( 'VRRP-update- subflow' )
update_ amps_subflow = unordered_ flow.Flow(
reload_ listener_ subflow = unordered_ flow.Flow(
$ grep -r 'unordered_flow' ./octavia/
octavia- failover- amphora- flow - linear_flow nRevertTask - used to revert LB to provisioning_status ERROR if this flow goes wrong
- task AmphoraToErrorO
Based on the above reasons, do you think your following log (this log is same as my log as well) is really realated to rever operation? or it is failing because of an issue with the amphoras themselves? thanks.
Nov 10 07:52:05 gthiemon-devstack octavia- worker[ 3952342] : ERROR oslo_messaging. rpc.server [-] Exception during message handling: taskflow. exceptions. WrappedFailure: WrappedFailure: [Failure: octavia. common. exceptions. ComputeWaitTime outException: Waiting for compute id 5ba04bb8- a632-4c44- a03e-234ac87b5b 16 to go active timeout., Failure: octavia. amphorae. driver_ exceptions. exceptions. AmpConnectionRe try: Could not connect to amphora, exception caught: foo, Failure: octavia. common. exceptions. ComputeWaitTime outException: Waiting for compute id b64f2a56- 020f-4c73- 99ca-5e427ea78f 1c to go active timeout., Failure: octavia. common. exceptions. ComputeWaitTime outException: Waiting for compute id b64f2a56- 020f-4c73- 99ca-5e427ea78f 1c to go active timeout.]
[1] https:/ /paste. ubuntu. com/p/C3WsRqgWJ r/ /opendev. org/openstack/ octavia/ commit/ a9ee09a676074ee b619a4d7c3d9912 114de50e88 /review. opendev. org/c/openstack /taskflow/ +/900746
[2] https:/
[3] https:/