Report mismatch between rp_filter sysconfig and ignore-loose-rpf charm config

Bug #1895547 reported by Cory Johns
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Calico Charm
Fix Released
Medium
Stone Preston
Canal Charm
Triaged
Medium
Unassigned

Bug Description

During the debugging of https://bugs.launchpad.net/charm-kubernetes-worker/+bug/1892067, George ran into an issue where the Calico portion of Canal was malfunctioning due to the rp_filter sysconfig being set (which makes the cluster vulnerable to spoofing attacks) but masked by the fact that the Flannel portion was still working. We have the ignore-loose-rpf charm config for cases where that flag must be set, but the charm should check for and report rp_filter=True & ignore-loose-rpf=False so that networking doesn't silently fail or behave badly.

Cory Johns (johnsca)
Changed in charm-canal:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Pedro Guimarães (pguimaraes) wrote :

Hi,

We just hit this issue on a deployment and I need to say this is not easy to troubleshoot.

The issue we were seeing is that, in Focal, traffic was not getting src NAT from iptables when leaving POD > outside; whereas Bionic it was.

Looking into the iptables, we could see that Focal had no cali-* chains, while Bionic had those.

The reason was indeed the rp_filter errors:

2021-06-25 14:27:17.024 [FATAL][2023] int_dataplane.go 1032: Kernel's RPF check is set to 'loose'. This would allow endpoints to spoof their IP address. Calico requires net.ipv4.conf.all.rp_filter to be set t...

Calico/k8s issue: https://github.com/kubernetes-sigs/kind/issues/891

Indeed, it makes sense the change in behavior on Bionic vs. Focal due to:
https://github.com/systemd/systemd/commit/230450d4e4f1f5fc9fa4295ed9185eea5b6ea16e

We need to consider that iptables is also used by kube-proxy.

On the kubernetes-sigs above, seems they moved rp_filter=1 by default.

Howeer, if we keep the OS value, then we surely need a more clear warning, such as:

if OS>=focal and not ignore-loose-rpf and rp_filter != 1:
block the charm and warn the rp_filter mismatch with Calico config.

Additional reference: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
(search for rp_filter)

Revision history for this message
Bayani Carbone (bcarbone) wrote (last edit ):

I also experienced this on a deployment when using focal machines. The workaround I applied was to set net.ipv4.conf.all.rp_filter = 1 using the sysconfig's charm sysctl config option.

Revision history for this message
Bartosz Woronicz (mastier1) wrote :

I encountered issue, when traffic from pods cidr overlay was hitting the firewall instead being ipinip encapsulated in Calico. The firewall got allowance for underlay network on the worker nodes.Env is with no L2 seperation in vlans. Just 3 baremetal nodes and VMs on them as masters, workers etc.

The machines got the following settings

$ juju ssh -m kubernetes 3 sysctl net.ipv4.conf.eth0.rp_filter
net.ipv4.conf.eth0.rp_filter = 2
$ juju ssh -m kubernetes 3 sysctl net.ipv4.conf.all.rp_filter
net.ipv4.conf.all.rp_filter = 2
$ juju ssh -m kubernetes 3 sysctl net.ipv4.conf.default.rp_filter
net.ipv4.conf.default.rp_filter = 2

After setting ignore-loose-rpf=true the charm poked sysctl and set 'default.rp_filter' to 1 (Strict Mode) instead of 2 (Loose Mode).
$ juju ssh -m kubernetes 3 sysctl net.ipv4.conf.default.rp_filter
net.ipv4.conf.default.rp_filter = 1

Other all.rp_filter and eth0.rp_filter remain set 2.

But that fixed the issue for me.

Changed in charm-calico:
milestone: none → 1.24
Changed in charm-canal:
milestone: none → 1.24
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

We took a two-pronged approach to address this:

- Block calico if rp_filter / ignore_loose are mismatched
  - https://github.com/charmed-kubernetes/layer-calico/pull/82

- Force k8s nodes to rp_filter=1
  - https://github.com/charmed-kubernetes/layer-kubernetes-node-base/pull/2

Changed in charm-calico:
status: Triaged → Fix Committed
Changed in charm-calico:
assignee: nobody → Stone Preston (stonepreston)
Changed in charm-canal:
milestone: 1.24 → none
Changed in charm-calico:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.