failover of ACTIVE_STANDBY LBs can take a lot of time in amphorav1
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
octavia |
Fix Released
|
Medium
|
Gregory Thiemonge |
Bug Description
There is an issue in amphorav1 when 2 amps of an A/S LB are missing and a failover is triggered, the failover can take between 15 and 40 min, depending on the retry/interval config in the worker.
when recreating the 1st amphora, the failover flow also updates the 2nd amphora (it updates the VRRP config in all the amphorae):
but if the 2nd amphora is also failing, those tasks skip the update after "timeout_dict" (but they are not failing, and they set the status of the amp to ERROR)
however, AmphoraIndexVRR
- it calls amphora_
- it passes timeout_dict to amphora_
[1] https:/
[2] https:/
We can fix both issues but there's a more interesting lead
the failover flow invokes 3 successive tasks:
- AmphoraIndexUpd
- AmphoraIndexVRR
- AmphoraIndexVRR
AmphoraIndexUpd
when the amphora is not reachable, amp_vrrp_int is None [3]
This value can be passed to AmphoraIndexVRR
Changed in octavia: | |
assignee: | nobody → Gregory Thiemonge (gthiemonge) |
importance: | Undecided → Medium |
status: | New → Confirmed |
I think we need coverage in octavia- tempest- plugin for this situation