Flannel network lost, requiring service restart

Bug #2004150 reported by Alex Pearce
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canal Charm
Fix Released
Medium
Kevin W Monroe
Flannel Charm
Fix Released
Medium
Adam Dyess

Bug Description

Kubernetes pods regularly (several times per week) lose their ability to communicate over the Flannel network. I have noticed these patterns:

1. The Ubuntu 20.04 nodes have never exhibited this problem, only Ubuntu 22.04 nodes.
2. Restarting the Flannel service usually restores Flannel network connectivity (`systemctl restart flannel`).

My best guess is that this is due to https://github.com/flannel-io/flannel/issues/1474, however I have not been able to pin down exactly why. In particular I do not understand why my 20.04 nodes do not exhibit the problem.

All nodes are running revision 52 of the Canal charm, version 0.11.0+ck1/3.10.1.

The issue linked above was addressed in Flannel v0.15.1. The Flannel charm pins v0.11.0:

https://github.com/charmed-kubernetes/charm-flannel/blob/523336db25001088a7336dffb2616c5be23a46fb/build-flannel-resources.sh#L4

I am happy to submit a PR which bumps the version. Ideally I would test this but am unsure how.

Revision history for this message
Alex Pearce (alexpearce) wrote :
tags: added: review-needed
Revision history for this message
Adam Dyess (addyess) wrote :

Updated the PR:
* includes changes to flannel version
* includes changed to etcd version
* updates etcdctl commands to use etcd v3
* support for building charm resources with github actions

https://github.com/charmed-kubernetes/charm-flannel/pull/86

I confirmed that an upgrade from the previous flannel updated to etcd v3 without issue.

Changed in charm-flannel:
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Adam Dyess (addyess)
milestone: none → 1.27
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

We'll need a new pin so canal picks up the flannel build changes -- similar to what we did here:

https://github.com/charmed-kubernetes/layer-canal/pull/73/files

Changed in charm-canal:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 1.27
Revision history for this message
Alex Pearce (alexpearce) wrote :

This is super, thanks very much for the help.

I've opened a layer-canal PR to follow the Flannel change:

https://github.com/charmed-kubernetes/layer-canal/pull/76

Changed in charm-canal:
assignee: nobody → Kevin W Monroe (kwmonroe)
status: Triaged → In Progress
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

Followup PR to sync the charm-flannel etcd3 and resource-building changes to canal:

https://github.com/charmed-kubernetes/layer-canal/pull/77

Changed in charm-canal:
status: In Progress → Fix Committed
Changed in charm-flannel:
status: In Progress → Fix Committed
tags: removed: review-needed
Changed in charm-canal:
status: Fix Committed → Fix Released
Changed in charm-flannel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.