[docker] Unable to run kubernetes-master with calico integration in a LXD container
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Calico Charm |
New
|
Undecided
|
Unassigned |
Bug Description
The Calico charm sets up a service called calico-node to run a docker container as follows:
systemctl cat calico-node.service
https:/
ExecStart=
-e ETCD_ENDPOINTS=https:/
-e ETCD_CA_
-e ETCD_CERT_
-e ETCD_KEY_
# ...
The problem is that if kubernetes-master is placed into a LXD container created by Juju calico-node is unable to start.
Docker is unable to launch a container because --privileged is used in the run command.
journalctl -u calico-node.service
May 31 07:39:55 juju-fa887c-
May 31 07:39:55 juju-fa887c-
May 31 07:40:05 juju-fa887c-
May 31 07:40:05 juju-fa887c-
May 31 07:40:05 juju-fa887c-
May 31 07:40:05 juju-fa887c-
May 31 07:40:05 juju-fa887c-
May 31 07:40:05 juju-fa887c-
May 31 07:40:05 juju-fa887c-
May 31 07:40:06 juju-fa887c-
I enabled nesting on the LXD container by hand (could be done in a charm LXD profile which is possible with Juju now):
sudo lxc config set juju-fa887c-
sudo lxc restart juju-fa887c-
And changed ExecStart to run without "privileged" which resulted in this (even without additional capabilities added to the container):
ExecStart=
docker logs calico-node
2019-05-31 08:02:10.258 [INFO][9] startup.go 173: Early log level set to info
2019-05-31 08:02:10.258 [INFO][9] client.go 202: Loading config from environment
2019-05-31 08:02:10.258 [INFO][9] startup.go 83: Skipping datastore connection test
2019-05-31 08:02:10.271 [INFO][9] startup.go 259: Building new node resource Name="juju-
2019-05-31 08:02:10.271 [INFO][9] startup.go 273: Initialise BGP data
2019-05-31 08:02:10.271 [INFO][9] startup.go 362: Using IPv4 address from environment: IP=172.16.7.64
2019-05-31 08:02:10.271 [INFO][9] startup.go 392: IPv4 address 172.16.7.64 discovered on interface eth0
2019-05-31 08:02:10.271 [INFO][9] startup.go 338: Node IPv4 changed, will check for conflicts
2019-05-31 08:02:10.283 [INFO][9] startup.go 530: No AS number configured on node resource, using global value
2019-05-31 08:02:10.283 [INFO][9] etcd.go 111: Ready flag is already set
2019-05-31 08:02:10.284 [INFO][9] client.go 139: Using previously configured cluster GUID
2019-05-31 08:02:10.291 [INFO][9] compat.go 796: Returning configured node to node mesh
2019-05-31 08:02:10.303 [INFO][9] startup.go 131: Using node name: juju-fa887c-
2019-05-31 08:02:10.412 [INFO][30] client.go 202: Loading config from environment
2019-05-31 08:02:10.429 [INFO][30] ipam.go 120: Auto-assign 1 ipv4, 0 ipv6 addrs for host 'juju-fa887c-
2019-05-31 08:02:10.430 [INFO][30] ipam.go 172: Ran out of existing affine blocks for host 'juju-fa887c-
2019-05-31 08:02:10.431 [INFO][30] ipam.go 195: Need to allocate 1 more addresses - allocate another block
2019-05-31 08:02:10.431 [INFO][30] ipam_block_
2019-05-31 08:02:10.432 [INFO][30] ipam_block_
2019-05-31 08:02:10.433 [INFO][30] ipam.go 208: Claimed new block 192.168.154.192/26 - assigning 1 addresses
2019-05-31 08:02:10.434 [INFO][30] ipam_block.go 343: New allocation attribute: {AttrPrimary:<nil> AttrSecondary:
2019-05-31 08:02:10.435 [INFO][30] ipam.go 285: Auto-assigned 1 out of 1 IPv4s: [192.168.154.192]
2019-05-31 08:02:10.443 [INFO][30] allocate_
Starting libnetwork service
Calico node started successfully
I can see that other projects that run calico add CAP_NET_ADMIN and CAP_SYS_ADMIN to the docker container and do away without --privileged.
https:/
https:/
So we could modify the per-application LXD profile for kubernetes-master to include nesting (without setting "privileged") and also pass namespaced capabilities to the docker run command in the Calico charm:
ExecStart=
Example:
23079 /usr/bin/docker run --cap-add=NET_ADMIN --cap-add=SYS_ADMIN --net=host --name=calico-node -e ETCD_ENDPOINTS=https:/
summary: |
- Unable to run kubernetes-master with calico integration in a LXD - container + [docker] Unable to run kubernetes-master with calico integration in a + LXD container |
Looked at this https:/ /github. com/projectcali co/calicoctl/ issues/ 310
It appears to be that there are some code-paths (probably only relevant for worker nodes) that change networking sysctls via /proc/sys/ net/ipv4/ conf/<interface >/<config_ key>:
https:/ /github. com/projectcali co/felix/ blob/v3. 6.0/dataplane/ linux/endpoint_ mgr.go# L862-L879
And Docker without --privileged sets up /proc/sys as read-only even with additional capabilities:
$ docker exec -it calico-node /bin/sh
/ # mount | grep /proc/sys
proc on /proc/sys type proc (ro,relatime)
/ # capsh --print | grep net_admin cap_dac_ override, cap_fowner, cap_fsetid, cap_kill, cap_setgid, cap_setuid, cap_setpcap, cap_net_ bind_service, cap_net_ admin,cap_ net_raw, cap_sys_ chroot, cap_sys_ admin,cap_ mknod,cap_ audit_write, cap_setfcap+ eip cap_dac_ override, cap_fowner, cap_fsetid, cap_kill, cap_setgid, cap_setuid, cap_setpcap, cap_net_ bind_service, cap_net_ admin,cap_ net_raw, cap_sys_ chroot, cap_sys_ admin,cap_ mknod,cap_ audit_write, cap_setfcap
Current: = cap_chown,
Bounding set =cap_chown,
/ # capsh --print | grep sys_admin cap_dac_ override, cap_fowner, cap_fsetid, cap_kill, cap_setgid, cap_setuid, cap_setpcap, cap_net_ bind_service, cap_net_ admin,cap_ net_raw, cap_sys_ chroot, cap_sys_ admin,cap_ mknod,cap_ audit_write, cap_setfcap+ eip cap_dac_ override, cap_fowner, cap_fsetid, cap_kill, cap_setgid, cap_setuid, cap_setpcap, cap_net_ bind_service, cap_net_ admin,cap_ net_raw, cap_sys_ chroot, cap_sys_ admin,cap_ mknod,cap_ audit_write, cap_setfcap
Current: = cap_chown,
Bounding set =cap_chown,
# LXD juju-fa887c- 11-lxd- 2:~$ mount | grep /proc nodev,noexec, relatime) nodev,relatime, user_id= 0,group_ id=0,allow_ other) nodev,relatime, user_id= 0,group_ id=0,allow_ other) nodev,relatime, user_id= 0,group_ id=0,allow_ other) nodev,relatime, user_id= 0,group_ id=0,allow_ other) nodev,relatime, user_id= 0,group_ id=0,allow_ other) nodev,relatime, user_id= 0,group_ id=0,allow_ other) fs/binfmt_ misc type binfmt_misc (rw,relatime)
ubuntu@
proc on /proc type proc (rw,nosuid,
lxcfs on /proc/cpuinfo type fuse.lxcfs (rw,nosuid,
lxcfs on /proc/diskstats type fuse.lxcfs (rw,nosuid,
lxcfs on /proc/meminfo type fuse.lxcfs (rw,nosuid,
lxcfs on /proc/stat type fuse.lxcfs (rw,nosuid,
lxcfs on /proc/swaps type fuse.lxcfs (rw,nosuid,
lxcfs on /proc/uptime type fuse.lxcfs (rw,nosuid,
binfmt_misc on /proc/sys/
proc on /dev/.lxc/proc type proc (rw,relatime)
There was some work to address this:
https:/ /github. com/moby/ moby/issues/ 21649 /github. com/moby/ moby/pull/ 21751 /github. com/moby/ moby/issues/ 36597
https:/
https:/
https:/ /github. com/moby/ moby/pull/ 36644
https:/ /docs.docker. com/engine/ release- notes/# 18060-ce
"RawAccess allows a set of paths to be not set as masked or readonly. moby/moby#36644"
CLI integration: /github. com/docker/ cli/pull/ 1347 (per-path config support) /github. com/docker/ cli/pull/ 1808 (--security-opt systempaths= unconfined)
https:/
https:/