ubuntu-repository-cache_rsync cron job missing on leader

Bug #1885653 reported by Stephen Muss
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Incomplete
Undecided
Unassigned
Ubuntu Repository Cache Charm
New
Undecided
Unassigned

Bug Description

On two separate occasions I have seen the elected leader missing the ubuntu-repository-cache_rsync cron job.

This appears to happen after "leadership failure: lease operation timed out" as per below.

It is possible to trigger re-rendering the cron job file by setting something like "juju config ubuntu-repository-cache apache2_mpm_maxrequestworkers". However, this should be fixed to ensure that the file always exists.

2020-06-29 00:50:31 WARNING juju.worker.uniter.operation leader.go:116 we should run a leader-deposed hook here, but we can't yet
2020-06-29 00:50:31 ERROR juju.worker.dependency engine.go:671 "leadership-tracker" manifold worker returned unexpected error: leadership failure: lease operation timed out
2020-06-29 00:50:31 ERROR juju.worker.uniter agent.go:31 resolver loop error: could not acquire lock: cancelled acquiring mutex
2020-06-29 00:50:31 INFO juju.worker.uniter uniter.go:457 unit "ubuntu-repository-cache/1" shutting down: could not acquire lock: cancelled acquiring mutex
2020-06-29 00:50:34 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-ubuntu-repository-cache-1
2020-06-29 00:50:34 INFO juju.agent.tools symlinks.go:40 was a symlink, now looking at /var/lib/juju/tools/2.7.6-xenial-amd64
2020-06-29 00:50:36 INFO juju.worker.uniter.relation relations.go:553 joining relation "livepatch:juju-info ubuntu-repository-cache:juju-info"
2020-06-29 00:50:36 INFO juju.worker.uniter.relation relations.go:589 joined relation "livepatch:juju-info ubuntu-repository-cache:juju-info"
2020-06-29 00:51:24 ERROR juju.worker.dependency engine.go:671 "leadership-tracker" manifold worker returned unexpected error: leadership failure: lease operation timed out
2020-06-29 00:51:24 INFO juju.worker.uniter.relation relations.go:553 joining relation "ubuntu-repository-cache:cluster"
2020-06-29 00:51:27 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-ubuntu-repository-cache-1
2020-06-29 00:51:27 INFO juju.agent.tools symlinks.go:40 was a symlink, now looking at /var/lib/juju/tools/2.7.6-xenial-amd64
2020-06-29 00:51:29 INFO juju.worker.uniter.relation relations.go:553 joining relation "telegraf-u-r-c:juju-info ubuntu-repository-cache:juju-info"
2020-06-29 00:51:30 INFO juju.worker.uniter.relation relations.go:589 joined relation "telegraf-u-r-c:juju-info ubuntu-repository-cache:juju-info"
2020-06-29 00:52:17 ERROR juju.worker.dependency engine.go:671 "leadership-tracker" manifold worker returned unexpected error: leadership failure: lease operation timed out
2020-06-29 00:52:17 INFO juju.worker.uniter.relation relations.go:553 joining relation "ntp:juju-info ubuntu-repository-cache:juju-info"
2020-06-29 00:52:22 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-ubuntu-repository-cache-1
2020-06-29 00:52:22 INFO juju.agent.tools symlinks.go:40 was a symlink, now looking at /var/lib/juju/tools/2.7.6-xenial-amd64
2020-06-29 00:52:24 INFO juju.worker.uniter.relation relations.go:553 joining relation "landscape-client:container ubuntu-repository-cache:juju-info"
2020-06-29 00:52:24 INFO juju.worker.uniter.relation relations.go:589 joined relation "landscape-client:container ubuntu-repository-cache:juju-info"
2020-06-29 00:53:12 ERROR juju.worker.dependency engine.go:671 "leadership-tracker" manifold worker returned unexpected error: leadership failure: lease operation timed out
2020-06-29 00:53:12 INFO juju.worker.uniter.relation relations.go:553 joining relation "container-log-archive:juju-info ubuntu-repository-cache:juju-info"
2020-06-29 00:53:17 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-ubuntu-repository-cache-1
2020-06-29 00:53:17 INFO juju.agent.tools symlinks.go:40 was a symlink, now looking at /var/lib/juju/tools/2.7.6-xenial-amd64
2020-06-29 00:53:19 INFO juju.worker.uniter.relation relations.go:553 joining relation "nrpe:nrpe-external-master ubuntu-repository-cache:nrpe-external-master"
2020-06-29 00:53:19 INFO juju.worker.uniter.relation relations.go:589 joined relation "nrpe:nrpe-external-master ubuntu-repository-cache:nrpe-external-master"
2020-06-29 00:53:23 INFO juju.worker.leadership tracker.go:194 ubuntu-repository-cache/1 promoted to leadership of ubuntu-repository-cache
2020-06-29 00:53:24 INFO juju.worker.uniter.relation relations.go:553 joining relation "telegraf-u-r-c:juju-info ubuntu-repository-cache:juju-info"
2020-06-29 00:53:26 INFO juju.worker.uniter.relation relations.go:589 joined relation "telegraf-u-r-c:juju-info ubuntu-repository-cache:juju-info"
2020-06-29 00:53:27 INFO juju.worker.uniter.relation relations.go:553 joining relation "ntp:juju-info ubuntu-repository-cache:juju-info"
2020-06-29 00:53:28 INFO juju.worker.uniter.relation relations.go:589 joined relation "ntp:juju-info ubuntu-repository-cache:juju-info"
2020-06-29 00:53:29 INFO juju.worker.uniter.relation relations.go:553 joining relation "livepatch:juju-info ubuntu-repository-cache:juju-info"
2020-06-29 00:53:30 INFO juju.worker.uniter.relation relations.go:589 joined relation "livepatch:juju-info ubuntu-repository-cache:juju-info"
2020-06-29 00:53:31 INFO juju.worker.uniter.relation relations.go:553 joining relation "ubuntu-repository-cache:cluster"
2020-06-29 00:53:31 INFO juju.worker.uniter.relation relations.go:589 joined relation "ubuntu-repository-cache:cluster"
2020-06-29 00:53:32 INFO juju.worker.uniter.relation relations.go:553 joining relation "landscape-client:container ubuntu-repository-cache:juju-info"
2020-06-29 00:53:32 INFO juju.worker.uniter.relation relations.go:589 joined relation "landscape-client:container ubuntu-repository-cache:juju-info"
2020-06-29 00:53:33 INFO juju.worker.uniter.relation relations.go:553 joining relation "container-log-archive:juju-info ubuntu-repository-cache:juju-info"
2020-06-29 00:53:33 INFO juju.worker.uniter.relation relations.go:589 joined relation "container-log-archive:juju-info ubuntu-repository-cache:juju-info"
2020-06-29 00:53:35 INFO juju.worker.uniter uniter.go:246 unit "ubuntu-repository-cache/1" started
2020-06-29 00:53:36 INFO juju.worker.uniter uniter.go:285 hooks are retried true
2020-06-29 00:53:43 INFO juju.worker.uniter resolver.go:130 found queued "leader-elected" hook
2020-06-29 00:53:45 INFO juju-log leader-elected fired. This unit is the new leader: ubuntu-repository-cache/1

Revision history for this message
Haw Loeung (hloeung) wrote :

The charm relies on leadership election to ensure that only one unit has the cron job and that it is only running on one unit, the master. Adding the Juju project here. We've only started seeing these recently and likely to do with the upgrade to 2.7.6 across the board.

Revision history for this message
Ian Booth (wallyworld) wrote :

Can you expand a bit more? You are saying the leader elected hook is not running?

We see a few "lease operation timed out" messages which can occur if the disk is busy and raft cannot fsync to boltdb in a timely manner.

After that, things settle down, the disk load must have decreased, and we see

ubuntu-repository-cache/1 promoted to leadership of ubuntu-repository-cache

and then

found queued "leader-elected" hook
leader-elected fired. This unit is the new leader: ubuntu-repository-cache/1

Can you provide more info about the deployment and confirm what the problem is?

Pen Gale (pengale)
Changed in juju:
status: New → Incomplete
Revision history for this message
Haw Loeung (hloeung) wrote :

The problem is leadership seems flakey in 2.7.6 where we've not seen this in 2.6.10. Sure, the charm could probably handle this better and ensure that the cron job is shipped when leadership fails and new leader re-elected.

Anyways, will keep an eye on things to see if this continues to occur.

Haw Loeung (hloeung)
summary: - ubuntu-repository-cache_rsync missing on leader
+ ubuntu-repository-cache_rsync cron job missing on leader
Revision history for this message
Haw Loeung (hloeung) wrote :

This is LP:1797297

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.