leadership changes can break sync, charm in error state

Bug #1673325 reported by Stuart Bishop
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Ubuntu Repository Cache Charm
Fix Released
High
Unassigned
ubuntu-repository-cache (Juju Charms Collection)
Won't Fix
High
Unassigned

Bug Description

There is code that relies on the leader_id leadership setting being set, and apparently there are cases where that is not true. In this situation, it's probably best to do nothing. A leader-elected hook will soon be invoked, the setting set and hooks on non-leaders triggered.

2017-03-14 23:14:03 INFO juju-log cluster:2: Cluster relation changed for ubuntu-repository-cache
2017-03-14 23:14:03 INFO juju-log cluster:2: SSH key already exists at /home/www-sync/.ssh/id_rsa.
2017-03-14 23:14:03 INFO juju-log cluster:2: Syncing authorized_keys @ /home/www-sync/.ssh/authorized_keys.
2017-03-14 23:14:03 INFO cluster-relation-changed # 10.173.131.231 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.0.ISPATCHED.14.04.8
2017-03-14 23:14:03 INFO cluster-relation-changed # 10.173.131.231 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.0.ISPATCHED.14.04.8
2017-03-14 23:14:03 INFO cluster-relation-changed # 10.184.151.173 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.0.ISPATCHED.14.04.8
2017-03-14 23:14:03 INFO cluster-relation-changed # 10.184.151.173 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.0.ISPATCHED.14.04.8
2017-03-14 23:14:03 INFO juju-log cluster:2: Syncing known_hosts @ /home/www-sync/.ssh/known_hosts.
2017-03-14 23:14:04 INFO juju-log cluster:2: Updating metadata on the leader
2017-03-14 23:14:04 WARNING juju-log cluster:2: Leader changed between peer_update_metadata and _leader_update_metadata
2017-03-14 23:14:04 INFO cluster-relation-changed Traceback (most recent call last):
2017-03-14 23:14:04 INFO cluster-relation-changed File "/var/lib/juju/agents/unit-ubuntu-repository-cache-2/charm/hooks/cluster-relation-changed", line 245, in <module>
2017-03-14 23:14:04 INFO cluster-relation-changed HOOKS.execute(sys.argv)
2017-03-14 23:14:04 INFO cluster-relation-changed File "/var/lib/juju/agents/unit-ubuntu-repository-cache-2/charm/lib/charmhelpers/core/hookenv.py", line 715, in execute
2017-03-14 23:14:04 INFO cluster-relation-changed self._hooks[hook_name]()
2017-03-14 23:14:04 INFO cluster-relation-changed File "/var/lib/juju/agents/unit-ubuntu-repository-cache-2/charm/hooks/cluster-relation-changed", line 186, in cluster_relation_changed
2017-03-14 23:14:04 INFO cluster-relation-changed mirror.peer_update_metadata()
2017-03-14 23:14:04 INFO cluster-relation-changed File "/var/lib/juju/agents/unit-ubuntu-repository-cache-2/charm/lib/ubuntu_repository_cache/mirror.py", line 301, in peer_update_metadata
2017-03-14 23:14:04 INFO cluster-relation-changed _leader_update_metadata()
2017-03-14 23:14:04 INFO cluster-relation-changed File "/var/lib/juju/agents/unit-ubuntu-repository-cache-2/charm/lib/ubuntu_repository_cache/mirror.py", line 142, in _leader_update_metadata
2017-03-14 23:14:04 INFO cluster-relation-changed leader_rel = rel[leader_id]
2017-03-14 23:14:04 INFO cluster-relation-changed KeyError: None
2017-03-14 23:14:04 ERROR juju.worker.uniter.operation runhook.go:107 hook "cluster-relation-changed" failed: exit status 1

Tags: canonical-is

Related branches

Revision history for this message
Paul Gear (paulgear) wrote :

https://pastebin.canonical.com/182519/ has non-wrapped version of above

description: updated
Stuart Bishop (stub)
tags: added: canonical-is
Chris Glass (tribaal)
Changed in ubuntu-repository-cache (Juju Charms Collection):
importance: Undecided → High
Chris Glass (tribaal)
Changed in ubuntu-repository-cache:
importance: Undecided → High
Haw Loeung (hloeung)
Changed in ubuntu-repository-cache:
status: New → Confirmed
Changed in ubuntu-repository-cache (Juju Charms Collection):
status: New → Confirmed
Revision history for this message
Haw Loeung (hloeung) wrote :

Still seeing this:

| ubuntu-repository-cache/3* active idle 4 13.83.243.16 80/tcp Ready (source version/commit 20210211-qqret694)

So u-r-c/3 is the current leader, it's IP:

| $ juju run --unit ubuntu-repository-cache/3 "ip a show dev eth0| grep 'inet 192'"
| inet 192.168.0.8/20 brd 192.168.15.255 scope global eth0

On the new unit just provisioned, it's ~www-sync/.ssh/known_hosts doesn't have an entry for this leader:

ubuntu@machine-6:~$ cat /home/www-sync/.ssh/known_hosts | cut -b1-25
| 192.168.0.4 ecdsa-sha2-ni
| 192.168.0.4 ssh-rsa AAAAB
| 192.168.0.4 ssh-ed25519 A
| 192.168.0.5 ssh-rsa AAAAB
| 192.168.0.5 ecdsa-sha2-ni
| 192.168.0.5 ssh-ed25519 A

Revision history for this message
Haw Loeung (hloeung) wrote :

Looks to be unison.ssh_authorized_peers() not updating the known_hosts file used. The charm definitely calls unison.ssh_authorized_peers() on cluster-relation-{joined,changed,departed}.

Revision history for this message
Haw Loeung (hloeung) wrote :

Oh, maybe different problem.

Revision history for this message
Haw Loeung (hloeung) wrote :
Revision history for this message
Haw Loeung (hloeung) wrote :

https://code.launchpad.net/~hloeung/ubuntu-repository-cache/allow-specifying-leader/+merge/427251 is a first step to rely less on Juju due to LP:1977798.

There's more work to be done, in particular removing the reliance on juju-run for triggering and pushing the latest metadata snapshot to peers.

Changed in ubuntu-repository-cache (Juju Charms Collection):
status: Confirmed → Won't Fix
Changed in ubuntu-repository-cache:
status: Confirmed → Triaged
Revision history for this message
Haw Loeung (hloeung) wrote :

Closing this off, we haven't seen an instance of missing or unset "leader_id" since the recent set of changes.

Changed in ubuntu-repository-cache:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.