Raft Leases spins in a tight loop if Leases cache is out of sync with Primary
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
John A Meinel | ||
2.5 |
Fix Released
|
High
|
John A Meinel |
Bug Description
We noticed in grafana that sometimes a small increase in API ClaimLease calls would result in a backed having massive numbers of Claim failures.
After digging through the code, it seems that the manager.handleClaim ends up with a tight loop around getting an Invalid claim. However, it only does an invalid claim if it thinks something about Leases, which the current Primary might disagree with.
Example, current node thinks that app/1 is the lease holder, and app/1 just asked to extend its lease. However, that lease has actually timed out and now app/0 has claimed the lease. If Controller/1 is not the current raft leader, and has not replicated the data from the leader recently, it will make a extend attempt based on stale data, but until that stale data is refreshed, it will continue making Claim requests in a tight loop.
Changed in juju: | |
milestone: | 2.6-beta1 → 2.6-beta2 |
Changed in juju: | |
status: | In Progress → Fix Committed |
Changed in juju: | |
status: | Fix Committed → Fix Released |
https:/ /github. com/juju/ juju/pull/ 9706 is a patch against 2.5.1