ANARCHY!!!!!!! Entirely leaderless application spotted in the wild
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned |
Bug Description
We noticed a Mojo rollout getting stuck/timing out waiting for steady state, with this from Juju:
19:00:47 O: 2020-01-07 18:58:20 [INFO] canonical-livepatch does not have a leader
19:00:47 O: 2020-01-07 18:58:45 [INFO] All units idle since 2020-01-07 18:58:26.540644Z (external-
19:00:47 O: 2020-01-07 18:58:45 [INFO] canonical-livepatch does not have a leader
.... (AD NAUSEAM)
Indeed, juju status and is-leader show everything thinks the application has no leader:
juju run --application canonical-livepatch "is-leader"
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
- Stdout: |
False
UnitId: canonical-
$ juju status |grep canonical-livepatch
canonical-livepatch active 18 canonical-livepatch local 0 ubuntu
canonical-
canonical-
canonical-
canonical-
canonical-
canonical-
canonical-
canonical-
canonical-
canonical-
canonical-
canonical-
canonical-
canonical-
canonical-
canonical-
canonical-
canonical-
(note no leadership asterisk)
Logs show this in all units, when the agent is restarted or when a new unit is added:
2020-01-07 16:17:25 DEBUG juju.worker.
2020-01-07 16:17:25 INFO juju.worker.
This is the PS4.5 shared controller with Juju 2.6.10:
$ juju status canonical-livepatch
Model Controller Cloud/Region Version SLA Timestamp
stg-ols-scasnap3 prodstack-is prodstack-
Things we've tried to get a new leader elected:
- Restart all Juju agents for that application
- Add a new unit to see if given a new candidate, a leader is elected
Things we haven't tried:
- Restart *all* juju agents
- Remove, then re-add the application
This seems very similar to https:/
Also, IS confirmed this thing is using raft leases, not legacy leases.
Let us know if any more diagnostics/logs are needed.
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → Medium |
Hi, can you please check for errors related to leases or from juju.worker. dependency in the controller logs?