Comment 1 for bug 1831249

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote : Re: Unable to run kubernetes-master with calico integration in a LXD container

Looked at this https://github.com/projectcalico/calicoctl/issues/310

It appears to be that there are some code-paths (probably only relevant for worker nodes) that change networking sysctls via /proc/sys/net/ipv4/conf/<interface>/<config_key>:

https://github.com/projectcalico/felix/blob/v3.6.0/dataplane/linux/endpoint_mgr.go#L862-L879

And Docker without --privileged sets up /proc/sys as read-only even with additional capabilities:

$ docker exec -it calico-node /bin/sh

/ # mount | grep /proc/sys
proc on /proc/sys type proc (ro,relatime)

/ # capsh --print | grep net_admin
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_admin,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap+eip
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_admin,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap

/ # capsh --print | grep sys_admin
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_admin,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap+eip
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_admin,cap_net_raw,cap_sys_chroot,cap_sys_admin,cap_mknod,cap_audit_write,cap_setfcap

# LXD
ubuntu@juju-fa887c-11-lxd-2:~$ mount | grep /proc
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
lxcfs on /proc/cpuinfo type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/diskstats type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/meminfo type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/stat type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/swaps type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/uptime type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
proc on /dev/.lxc/proc type proc (rw,relatime)

There was some work to address this:

https://github.com/moby/moby/issues/21649
https://github.com/moby/moby/pull/21751
https://github.com/moby/moby/issues/36597

https://github.com/moby/moby/pull/36644

https://docs.docker.com/engine/release-notes/#18060-ce
"RawAccess allows a set of paths to be not set as masked or readonly. moby/moby#36644"

CLI integration:
https://github.com/docker/cli/pull/1347 (per-path config support)
https://github.com/docker/cli/pull/1808 (--security-opt systempaths=unconfined)