Raft Leases spins in a tight loop if Leases cache is out of sync with Primary

Bug #1814424 reported by John A Meinel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
John A Meinel
2.5
Fix Released
High
John A Meinel

Bug Description

We noticed in grafana that sometimes a small increase in API ClaimLease calls would result in a backed having massive numbers of Claim failures.

After digging through the code, it seems that the manager.handleClaim ends up with a tight loop around getting an Invalid claim. However, it only does an invalid claim if it thinks something about Leases, which the current Primary might disagree with.

Example, current node thinks that app/1 is the lease holder, and app/1 just asked to extend its lease. However, that lease has actually timed out and now app/0 has claimed the lease. If Controller/1 is not the current raft leader, and has not replicated the data from the leader recently, it will make a extend attempt based on stale data, but until that stale data is refreshed, it will continue making Claim requests in a tight loop.

Revision history for this message
John A Meinel (jameinel) wrote :
Changed in juju:
milestone: 2.6-beta1 → 2.6-beta2
Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.