Load balancers may be stuck in PENDING_UPDATE in case of DB outage
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
octavia |
Fix Released
|
Medium
|
Gregory Thiemonge |
Bug Description
When a DB outage occurs during the update/
The revert of the flow by taskflow should handle all the errors but in many flows, the last revert task sets the status of the LB in ERROR (it can also be: resource in ERROR and LB ACTIVE).
https:/
https:/
But if the DB is down this update may fail and the LB will be stuck in a PENDING_* state.
In those cases, we have some useful log messages (ERROR) that indicate that the resource status may not be correct:
- "Failed to update load balancer %(lb)s provisioning status to ERROR due to"
- "Failed to update amphora %(amp)s status to ERROR due to"
We could also add a warning log message that would explicitly mention that a load balancer status is not correct and it may be locked (currently not all error messages include the id of the LB)
It could help admins to find the locked LBs.
One way to mitigate this issue would be to retry to update DB during a long period (could be a few hours, until the DB outage is resolved), using tenacity in the TaskUtils methods could be a solution.
Changed in octavia: | |
assignee: | nobody → Gregory Thiemonge (gthiemonge) |
importance: | Undecided → Medium |
status: | New → Confirmed |
Fix proposed to branch: master /review. opendev. org/c/openstack /octavia/ +/896383
Review: https:/