unit leadership gets confused
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
Tim Penhey |
Bug Description
I've been testing the logging output of unit leadership, but it appears we can get into a case where nothing thinks it is the leader.
Specifically, if I have 2 units, and unit/1 is the leader and unit/2 is not. If I then bounce all of the agents (for example by doing juju upgrade-juju), when the agents come back up I see:
unit-ul-2: 14:44:57 DEBUG juju.worker.
unit-ul-2: 14:44:57 INFO juju.worker.
unit-ul-2: 14:44:57 DEBUG juju.worker.
unit-ul-1: 14:44:57 DEBUG juju.worker.
Note that unit 2 clearly got the "I'm not the leader" message, but there is *no* entry for unit-1 saying "I'm the leader". It just goes to make a claim, and that seems to never return.
If I kill the existing unit/1, I can see in status that unit/2 becomes the leader. However, there is also *no* log entry (from juju debug-log) that shows that the current ul/2 agent becomes aware of that fact. If I add another unit to the application it doesn't seem to notice. But I also see:
unit-ul-4: 14:51:12 DEBUG juju.worker.
Maybe its a different bug. I've certainly never touched anything about migration for this model.
Changed in juju: | |
status: | Fix Committed → Fix Released |
I *think* this behaviour is caused by the same underlying bug as https:/ /bugs.launchpad .net/juju/ +bug/1815397
The claim hangs because the lease manager is trying to shut down, but it can't because the claim handler is trying to send on the errors channel which the main loop is no longer listening on.
It should be fixed by https:/ /github. com/juju/ juju/pull/ 9730