partial sync when leader_id is empty, yielding inconsistent mirror

Bug #1721159 reported by Paul Collins
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Expired
Undecided
Unassigned
juju-core
Won't Fix
Undecided
Unassigned
ubuntu-repository-cache (Juju Charms Collection)
Won't Fix
Undecided
Unassigned

Bug Description

We've observed sometimes that ubuntu-repository-cache leader settings are lost. When this happens and a sync is triggered, the leader unit updates the ubuntu_active symlink but the other units do not, yielding an inconsistent mirror and eventually stale metadata nagios alerts. This may be related to LP:1673325 but unlike that bug the charm does not enter an error state.

Workaround:

# note leader unit:
juju run --service ubuntu-repository-cache is-leader
# update leader settings, and confirm rejected on expected non-leader units:
juju run --service ubuntu-repository-cache 'leader-set leader_id=ubuntu-repository-cache/X'
# ^-- if the unit that accepted the change is not the $leader_id unit, repeat from the start
# invoke a sync, or wait for one to occur, then confirm all units' values match:
juju run --service ubuntu-repository-cache 'readlink /srv/ubuntu-repository-cache/apache/data/ubuntu_active'

Revision history for this message
Stuart Bishop (stub) wrote :

I've looked over the charm, and this really seems to be a Juju bug. There is a single leadership setting, and it gets set in the leader-elected hook and is never unset. The deployment would never complete unless this setting actually gets set.

One of two things seems to be happening. Either Juju is resetting the leadership settings causing leader-get to return an empty dict, or leader-get is not handling failure and returning an empty dict when it should be failing.

losing leadership settings could potentially cause major data loss, with database clusters reinitializing themselves thinking they are a fresh install (no cassandra seeds, no postgresql master etc.).

Revision history for this message
Paul Collins (pjdc) wrote :

This environment is currently running 1.25.13. Here's a potted history, based on machine-0:

drwxr-xr-x 2 root root 4096 Sep 20 23:49 /var/lib/juju/tools/1.25.13-xenial-amd64/
drwxr-xr-x 2 root root 4096 Jul 4 07:38 /var/lib/juju/tools/1.25.12-xenial-amd64/
drwxr-xr-x 2 root root 4096 Jan 13 2017 /var/lib/juju/tools/1.25.9-xenial-amd64/
drwxr-xr-x 2 root root 4096 Sep 16 2016 /var/lib/juju/tools/1.25.6-xenial-amd64/

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Paul Collins (pjdc), Stuart Bishop (stub),

Is there an easily reproducible scenario?

Since this is an intermittent failure, it would be great to have an easy reproduction. Wee are not likely to do much work in 1.25 but it would be great to know if this is also occurring in 2.x

Changed in juju-core:
status: New → Won't Fix
Changed in juju:
status: New → Incomplete
Revision history for this message
Paul Collins (pjdc) wrote :

We are in the process of replacing all of these Juju 1.x environments with 2.x. If it crops up on 2.x, we'll let you know.

Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 5 years, so we're marking it Expired. If you believe this is incorrect, please update the status.

Changed in juju:
status: Incomplete → Expired
tags: added: expirebugs-bot
Revision history for this message
Haw Loeung (hloeung) wrote :

Juju 2.x constant leadership changes reported in new bug LP:1977798

Revision history for this message
Haw Loeung (hloeung) wrote :

The u-r-c charm now no longer relies on Juju for metadata sync.

Changed in ubuntu-repository-cache (Juju Charms Collection):
status: New → Fix Released
status: Fix Released → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.