leadership worker wakes up too often
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
On a local model that only has the controller and default model, and 1 application deployed (with 3 units) in the default model, we seem to be waking up *way* too often to deal with lease issues. If you "juju bootstrap --debug lxd" and then
"juju debug-log -m controller --include-module juju.worker.lease --replay"
I see:
15:47:22.674 DEBUG juju.worker.lease waking to check leases at 2017-10-31 11:47:52.680141764 +0000 UTC
15:47:22.764 DEBUG juju.worker.lease waking to check leases at 2017-10-31 11:48:22.725722344 +0000 UTC m=+96834.321437694
15:47:22.805 DEBUG juju.worker.lease waking to check leases at 2017-10-31 11:48:22.767452117 +0000 UTC m=+96834.363167338
15:47:25.140 DEBUG ... at 2017-10-31 11:48:25.092790999 m=...
15:47:45.744 DEBUG ... at 2017-10-31 11:48:45.744120828 m=...
15:47:52.716 DEBUG ... at 2017-10-31 11:48:22.725722344
15:47:52.815 DEBUG ... at 2017-10-31 11:48:52.767118576 m=...
15:47:52.850 DEBUG ... at 2017-10-31 11:48:52.807188433 m=...
15:47:55.139 DEBUG ... at 2017-10-31 11:48:55.092921836 m=...
15:48:00.183 DEBUG ... at 2017-10-31 11:49:00.183642493 m=...
15:48:22.761 DEBUG ... at 2017-10-31 11:48:52.767118576
15:48:22.869 DEBUG ... at 2017-10-31 11:49:22.81749278 m=+...
15:48:22.895 DEBUG ... at 2017-10-31 11:49:22.852232168 m=...
15:48:25.138 DEBUG ... at 2017-10-31 11:49:25.093142689 m=...
15:48:45.746 DEBUG ... at 2017-10-31 11:49:45.746100232 m=...
15:48:52.803 DEBUG ... at 2017-10-31 11:49:22.81749278
15:48:52.921 DEBUG ... at 2017-10-31 11:49:52.871913825 m=...
15:48:52.950 DEBUG ... at 2017-10-31 11:49:52.896919271 m=...
15:48:55.138 DEBUG ... at 2017-10-31 11:49:55.093818589 m=...
15:49:00.185 DEBUG ... at 2017-10-31 11:50:00.185937368 m=...
15:49:22.854 DEBUG ... at 2017-10-31 11:49:52.871913825
15:49:22.967 DEBUG ... at 2017-10-31 11:50:22.923342048 m=...
15:49:22.999 DEBUG ... at 2017-10-31 11:50:22.952223102 m=...
15:49:25.135 DEBUG ... at 2017-10-31 11:50:25.093480281 m=...
15:49:45.748 DEBUG ... at 2017-10-31 11:50:45.748549245 m=...
15:49:52.909 DEBUG ... at 2017-10-31 11:50:22.923342048
15:49:53.022 DEBUG ... at 2017-10-31 11:50:52.969434594 m=...
15:49:53.045 DEBUG ... at 2017-10-31 11:50:53.00087437 m=+...
15:49:55.137 DEBUG ... at 2017-10-31 11:50:55.094103739 m=...
15:50:00.187 DEBUG ... at 2017-10-31 11:51:00.187840411 m=...
15:50:22.959 DEBUG ... at 2017-10-31 11:50:52.969434594
15:50:23.077 DEBUG ... at 2017-10-31 11:51:23.024836933 m=...
15:50:23.093 DEBUG ... at 2017-10-31 11:51:23.046794216 m=...
15:50:25.136 DEBUG ... at 2017-10-31 11:51:25.094046756 m=...
15:50:45.751 DEBUG ... at 2017-10-31 11:51:45.751657032 m=...
15:50:53.004 DEBUG ... at 2017-10-31 11:51:23.024836933
15:50:53.138 DEBUG ... at 2017-10-31 11:51:53.079786742 m=...
15:50:53.144 DEBUG ... at 2017-10-31 11:51:53.094451491 m=...
15:50:55.142 DEBUG ... at 2017-10-31 11:51:55.095020156 m=...
15:51:00.222 DEBUG ... at 2017-10-31 11:52:00.222685919 m=...
15:51:23.057 DEBUG ... at 2017-10-31 11:51:53.079786742
15:51:23.201 DEBUG ... at 2017-10-31 11:52:23.140325923 m=...
15:51:23.203 DEBUG ... at 2017-10-31 11:52:23.14629786 m=+...
15:51:25.138 DEBUG ... at 2017-10-31 11:52:25.096069006 m=...
15:51:45.786 DEBUG ... at 2017-10-31 11:52:45.786542186 m=...
15:51:53.109 DEBUG ... at 2017-10-31 11:52:23.140325923
15:51:53.262 DEBUG ... at 2017-10-31 11:52:53.203701951 m=...
15:51:53.267 DEBUG ... at 2017-10-31 11:52:53.206208631 m=...
15:51:55.141 DEBUG ... at 2017-10-31 11:52:55.095308182 m=...
15:52:00.258 DEBUG ... at 2017-10-31 11:53:00.258229456 m=...
So in the span of 5 minutes, we have approximately 50 log lines issued.
Now it is possible that the fact that some of those lines have an "m=" and some don't, means that there are multiple code paths we're running into. Some of them also are surprisingly close in times (11:33:21.27 and 11:33.21.28, but then a gap until 11:33:25.08).
However, 50 in 5 minutes is 10 per minute or 1 every 6 seconds. Which isn't terrible in itself, its more about the times when we say "waking" and it has only been 10ms since the last time we "woke".
tags: | added: leadership |
There is only one place where we log "waking to check", and it is based on a time.Time() so it seems that the difference in m=+ or not is just based on the Time object. Maybe sometimes we're creating the time object in 2 different ways? config. Clock.Now( ) manager. config. MaxSleep) config. Client. Leases( ) { After(nextTick) {
now := manager.
nextTick := now.Add(
for _, info := range manager.
if info.Expiry.
continue
}
nextTick = info.Expiry
}
I wonder if it is a case of
nextTick := now.Add(MaxSleep)
has the monotonic timer, but
nextTick = info.Expiry
does not.
Looking at the code, it seems: r(lease. ManagerConfig{
manager, err := lease.NewManage
Secretary: secretary,
Client: client,
Clock: st.clock(),
MaxSleep: time.Minute,
})
MaxSleep is defaulting to 1 minute. So it looks like we should be set up "wake up before the next lease that is about to expire, or 1 minute has passed".
I'll be updating this bug with debug-log at ms precision