Master services constantly restarting due to cert change

Bug #1826625 reported by Andrey Grebennikov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
EasyRSA Charm
Fix Released
High
Mike Wilson

Bug Description

In CDK-like deployment with latest versions of the charms and EasyRSA as the cert authority I can see that master components restarting every minute:

2019-04-27 03:15:24 INFO juju-log certificates:7: Certificate information changed, restarting api server
2019-04-27 03:15:25 INFO juju-log certificates:7: status-set: maintenance: Restarting snap.kube-apiserver.daemon service
--
2019-04-27 03:16:24 INFO juju-log certificates:7: Certificate information changed, restarting api server
2019-04-27 03:16:24 INFO juju-log certificates:7: status-set: maintenance: Restarting snap.kube-apiserver.daemon service
--
2019-04-27 03:17:17 INFO juju-log certificates:7: Certificate information changed, restarting api server
2019-04-27 03:17:17 INFO juju-log certificates:7: status-set: maintenance: Restarting snap.kube-apiserver.daemon service

On the side of EasyRSA charm it looks like the handler of "client-relation-changed" is called every 30 seconds or even more often:
https://pastebin.ubuntu.com/p/ZBsDqG37xf/

affects: charm-aws-integrator → charm-easyrsa
Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

From the bug template:

"When reporting a Charmed Kubernetes problem, please run CDK Field Agent (https://github.com/juju-solutions/cdk-field-agent) and include the resulting tarball as an attachment to your bug."

Changed in charm-easyrsa:
status: New → Triaged
Revision history for this message
Andrey Grebennikov (agrebennikov) wrote :

Tim,
Can you please elaborate on this script a little bit?
I ran it and it got stuck on the step "juju debug-log --replay" - it seems this option let the command run until it is interrupted, but if I do so - remaining commands from the script don't run obviously...

Though I uploaded the archive with the logs collected during this and previous commands.

Changed in charm-easyrsa:
assignee: nobody → Mike Wilson (knobby)
importance: Undecided → High
Revision history for this message
Mike Wilson (knobby) wrote :

Andrey,

The script takes some time to complete and this tar isn't a complete output. Can you let it run to completion and also provide your bundle? I don't see an integrator application, is this deployed on Openstack?

Revision history for this message
Mike Wilson (knobby) wrote :

I would also note that on a newly deployed cluster on AWS I am not seeing this thrashing of the cert. More information for reproduction would be very useful here so we can help.

Changed in charm-easyrsa:
status: Triaged → Incomplete
Revision history for this message
George Kraft (cynerva) wrote :
Changed in charm-easyrsa:
status: Incomplete → Triaged
Revision history for this message
George Kraft (cynerva) wrote :

We don't see this in typical deployments because the kubernetes-master:loadbalancer interface only has a single unit related to it. Andrey's bundle has 4 units attached to it: 3 units of kubeapi-load-balancer, 1 unit of keepalived.

Revision history for this message
George Kraft (cynerva) wrote :
Mike Wilson (knobby)
Changed in charm-easyrsa:
status: Triaged → In Progress
Revision history for this message
Mike Wilson (knobby) wrote :
Revision history for this message
Mike Wilson (knobby) wrote :
Revision history for this message
Mike Wilson (knobby) wrote :

Fixes available in kubernetes-master revision 665 and greater and kubernetes-worker revision 526 and greater.

Changed in charm-easyrsa:
status: In Progress → Fix Committed
Revision history for this message
George Kraft (cynerva) wrote :
George Kraft (cynerva)
Changed in charm-easyrsa:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.