charms do not inform user when calico-node is in a restart loop
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Calico Charm |
Incomplete
|
Undecided
|
Unassigned | ||
Canal Charm |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
During troubleshooting of an issue I found that Canal (deployed with the Calico charm) uses a systemd that is configured to restart always with no limit.
This results in a very quick recycle of the service which never ends. In this case there was an error in the service configuration and checking the service status would typically show it as running when in fact it was starting, failing, and restarting in a loop.
Ideally, systemd should be set with a start interval and burst such that continued failures do not result in a persistent restart loop.
The configuration error in this case was having rp_filter=2 and not having enabled the charm option to allow this setting. Therefore making this same configuration is a good reproducer to see the continual restarting of the service.
description: | updated |
For this site the following is in use.
Kubernetes 1.18.8
Kubernetes-worker charm: 696
canal: 0.10.0/3.10.1
canal charm: 733
I believe these are all of the relevant charm versions. I'm not clear given Canal is used with Calico if both charms are affected so have subscribed both to for triage.