Charms should call "is-leader" before attempting to call "leader-set"

Bug #1833089 reported by Chris Gregan on 2019-06-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Master Charm
Critical
Kevin W Monroe
juju
Critical
Unassigned

Bug Description

Juju: 2.6.4+2.6-6c4f52c
This appears to be a regression of https://bugs.launchpad.net/charm-ceph-mon/+bug/1760138

2019-06-16 16:08:04 DEBUG kube-control-relation-joined ERROR cannot write leadership settings: cannot write settings: not the leader
2019-06-16 16:08:04 DEBUG kube-control-relation-joined Traceback (most recent call last):
2019-06-16 16:08:04 DEBUG kube-control-relation-joined File "/var/lib/juju/agents/unit-kubernetes-master-0/charm/hooks/kube-control-relation-joined", line 22, in <module>
2019-06-16 16:08:04 DEBUG kube-control-relation-joined main()
2019-06-16 16:08:04 DEBUG kube-control-relation-joined File "/var/lib/juju/agents/unit-kubernetes-master-0/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 83, in main
2019-06-16 16:08:04 DEBUG kube-control-relation-joined hookenv._run_atexit()
2019-06-16 16:08:04 DEBUG kube-control-relation-joined File "/var/lib/juju/agents/unit-kubernetes-master-0/.venv/lib/python3.6/site-packages/charmhelpers/core/hookenv.py", line 1220, in _run_atexit
2019-06-16 16:08:04 DEBUG kube-control-relation-joined callback(*args, **kwargs)
2019-06-16 16:08:04 DEBUG kube-control-relation-joined File "/var/lib/juju/agents/unit-kubernetes-master-0/charm/reactive/kubernetes_master.py", line 517, in set_final_status
2019-06-16 16:08:04 DEBUG kube-control-relation-joined get_dns_provider()
2019-06-16 16:08:04 DEBUG kube-control-relation-joined File "/var/lib/juju/agents/unit-kubernetes-master-0/charm/reactive/kubernetes_master.py", line 2369, in get_dns_provider
2019-06-16 16:08:04 DEBUG kube-control-relation-joined leader_set(auto_dns_provider=dns_provider)
2019-06-16 16:08:04 DEBUG kube-control-relation-joined File "/var/lib/juju/agents/unit-kubernetes-master-0/.venv/lib/python3.6/site-packages/charms/reactive/decorators.py", line 219, in _wrapped
2019-06-16 16:08:04 DEBUG kube-control-relation-joined return func(*args, **kwargs)
2019-06-16 16:08:04 DEBUG kube-control-relation-joined File "lib/charms/leadership.py", line 62, in leader_set
2019-06-16 16:08:04 DEBUG kube-control-relation-joined hookenv.leader_set(settings)
2019-06-16 16:08:04 DEBUG kube-control-relation-joined File "/var/lib/juju/agents/unit-kubernetes-master-0/.venv/lib/python3.6/site-packages/charmhelpers/core/hookenv.py", line 1043, in inner_translate_exc2
2019-06-16 16:08:04 DEBUG kube-control-relation-joined return f(*args, **kwargs)
2019-06-16 16:08:04 DEBUG kube-control-relation-joined File "/var/lib/juju/agents/unit-kubernetes-master-0/.venv/lib/python3.6/site-packages/charmhelpers/core/hookenv.py", line 1104, in leader_set
2019-06-16 16:08:04 DEBUG kube-control-relation-joined subprocess.check_call(cmd)
2019-06-16 16:08:04 DEBUG kube-control-relation-joined File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
2019-06-16 16:08:04 DEBUG kube-control-relation-joined raise CalledProcessError(retcode, cmd)
2019-06-16 16:08:04 DEBUG kube-control-relation-joined subprocess.CalledProcessError: Command '['leader-set', 'auto_dns_provider=core-dns']' returned non-zero exit status 1.
2019-06-16 16:08:04 ERROR juju.worker.uniter.operation runhook.go:132 hook "kube-control-relation-joined" failed: exit status 1
2019-06-16 16:08:04 DEBUG juju.machinelock machinelock.go:180 machine lock released for uniter (run relation-joined (5; kubernetes-worker/0) hook)

Chris Gregan (cgregan) wrote :
Changed in juju:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 2.6.4
Joseph Phillips (manadart) wrote :

This looks like it results from this patch:
https://github.com/juju/juju/pull/10301

After the original patch to fix https://bugs.launchpad.net/juju-core/+bug/1723184, calling leader-set would be a no-op if the unit was not the leader.

This was deemed to be incorrect behaviour.

Charms should call "is-leader" before attempting to call "leader-set", or use the "leader-elected" hook to change leader settings.

Changed in charm-kubernetes-master:
assignee: nobody → Kevin W Monroe (kwmonroe)
importance: Undecided → Critical
status: New → In Progress
Kevin W Monroe (kwmonroe) wrote :

Fix for kubernetes-master is up for review here:

https://github.com/charmed-kubernetes/charm-kubernetes-master/pull/29

Kevin W Monroe (kwmonroe) wrote :

Fix for k8s-master in edge at cs:~containers/kubernetes-master-694

Changed in charm-kubernetes-master:
status: In Progress → Fix Committed
summary: - ERROR cannot write leadership settings: cannot write settings: not the
- leader
+ Charms should call "is-leader" before attempting to call "leader-set"
Changed in juju:
status: Triaged → Invalid
Jason Hobbs (jason-hobbs) wrote :

With the new version of the kubernetes-master charm, the kubernetes master units stayed in waiting state, with "Waiting for 7 kube-system pods to start".

status:
http://paste.ubuntu.com/p/6CkWpJ3Bgz/

bundle:
http://paste.ubuntu.com/p/CqT2TnjZ9Y/

I've attached a new crashdump.

Changed in charm-kubernetes-master:
status: Fix Committed → New
George Kraft (cynerva) wrote :

From the crashdump, pods are failing to pull container images:

Error response from daemon: Get https://image-registry.canonical.com:5000/v2/: Forbidden

This is because in edge, we've introduced image-registry.canonical.com as the new default image registry: https://github.com/charmed-kubernetes/charm-kubernetes-master/pull/24

Most likely Docker is trying to reach image-registry.canonical.com through a configured proxy, and the proxy is returning 403 Forbidden. You might be able to fix it by adding image-registry.canonical.com to the no_proxy config:

juju config kubernetes-worker no_proxy=image-registry.canonical.com

Although if your environment is anything like ScapeStack, you'll find that requests to image-registry.canonical.com just time out instead. If that happens, then presumably you have to get IS involved to update a firewall rule or something.

In the mean time, we will soon have a new revision of kubernetes-master on the candidate channel, which will include the leader-set hotfix, but not the new default registry. It would be good if we can get a test run from candidate once the new revision is up.

Kevin W Monroe (kwmonroe) wrote :

Thanks for the triage George. Hotfix for k8s-master in candidate at:

cs:~containers/kubernetes-master-695

@jhobbs, yesterday, we had planned to coordinate a new stable release of all k8s charms to match the imminent release of k8s 1.15. That would have allowed us to fix this issue in the brand new release without having to do a hotfix. The charm you tested yesterday (694) was built for that new release, and as George noted, has changed the way we fetch container images.

The good news is that your test got past the original bug description (no leader hook failures on followers). We'll need to coordinate so your test env can fetch the needed images when testing the new stable release, but not today because...

As it turns out, upstream k8s-1.15 slipped yesterday, so we won't be doing a major cdk release until that gets sorted. Instead, we've done a hotfix release for the current stable charms. This means your existing test infra should work as it always has, and k8s-master-695 should fix this bug.

Sorry for any confusion and the request to re-run with 695. Hopefully nothing but smooth sailing now!

Changed in charm-kubernetes-master:
status: New → Fix Committed
Jason Hobbs (jason-hobbs) wrote :

@kwmonroe, @cynerva - thanks for the quick work and responses. Our test is passing now with -695.

Kevin W Monroe (kwmonroe) wrote :

@jhobbs, that's great news. Thanks for the re-test. Charms and bundles including this fix have been released to stable:

https://jaas.ai/u/containers/kubernetes-master/695
https://jaas.ai/charmed-kubernetes/bundle/124

Changed in charm-kubernetes-master:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers