a lot of workers restarting at cron.daily time - presumably raft sync()
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned |
Bug Description
Hi,
cron.daily is a cron feature that allows one to easily run scripts once per day. Every day, on every single Ubuntu machine, it starts at 06:25, and runs the scripts present in /etc/cron.daily (see "grep daily /etc/crontab")
Among these scripts, "logrotate" is generally present. logrotate will rotate logs (duh !), which generally means compress them, which means IO and CPU usage tend to spike. On compute nodes with lots of VMs, it means that all of a sudden, all VMs are doing high IO and CPU usage. So workloads tend to work slower than usual around this time.
And workloads running on VMs include our juju controllers. We're monitoring API request time, and also "juju deploy cs:ubuntu" duration, and they tend to alert us every day around that time (API request time is > 30s for 20 min).
While investigating this, I also noticed high churn on the controllers during that time (spike in API requests, mostly "next", "life", "stop" and "relation"), which shouldn't happen since there's nothing generating more calls than usual at these times. This is probably caused by manifold workers restarts. Indeed, out of 6231 workers, 2400 restarted during the last event (so this morning) - to be precise, I counted restarts between 06:20 and 06:39/
I'm filing this bug to understand why so many workers are restarting, and how to prevent it.
Additional datapoint : we're seeing a lot of "juju.core.
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → High |
summary: |
- a lot of workers restarting at cron.daily time + a lot of workers restarting at cron.daily time - presumably raft sync() |
Today's restarts :
machine 0 : 1 out of 4891
machine 1 : 6224 out of 10817 (mongodb primary)
machine 2 : 42 out of 4932