Xenial containers fail to apply sysctl (procps)

Bug #1685677 reported by Logan V on 2017-04-24
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openstack-ansible
Critical
Jean-Philippe Evrard

Bug Description

procps which applies values from /etc/sysctl.d/* fails to start in our Xenial LXC containers because the /proc/sys directory is not read-write inside the container:

● systemd-sysctl.service - Apply Kernel Variables
   Loaded: loaded (/lib/systemd/system/systemd-sysctl.service; static; vendor preset: enabled)
   Active: inactive (dead)
Condition: start condition failed at Tue 2017-03-07 17:14:49 CST; 1 months 16 days ago
     Docs: man:systemd-sysctl.service(8)
           man:sysctl.d(5)

# grep 'Condition' /lib/systemd/system/systemd-sysctl.service
ConditionPathIsReadWrite=/proc/sys/

# ls -lha /proc/sys/
total 0
dr-xr-xr-x 1 root root 0 Mar 7 17:14 .
dr-xr-xr-x 1169 root root 0 Mar 7 17:14 ..
dr-xr-xr-x 1 root root 0 Apr 23 19:01 abi
dr-xr-xr-x 1 root root 0 Apr 23 19:01 debug
dr-xr-xr-x 1 root root 0 Apr 23 19:01 dev
dr-xr-xr-x 1 root root 0 Mar 7 18:50 fs
dr-xr-xr-x 1 root root 0 Mar 7 17:14 kernel
dr-xr-xr-x 1 root root 0 Mar 7 17:14 net
dr-xr-xr-x 1 root root 0 Apr 23 19:01 vm

Not sure yet if this is an OSA container configuration bug or an upstream issue.

Logan V (loganv) wrote :

Attempting to start it manually does not work:

# /lib/systemd/systemd-sysctl
Couldn't write '1' to 'kernel/kptr_restrict', ignoring: Read-only file system
Couldn't write '4 4 1 7' to 'kernel/printk', ignoring: Read-only file system
Couldn't write '1' to 'net/ipv4/tcp_syncookies', ignoring: No such file or directory
Couldn't write '1' to 'fs/protected_hardlinks', ignoring: Read-only file system
Couldn't write '176' to 'kernel/sysrq', ignoring: Read-only file system
Couldn't write '1' to 'fs/protected_symlinks', ignoring: Read-only file system
Couldn't write '65536' to 'vm/mmap_min_addr', ignoring: Read-only file system
Couldn't write '1' to 'kernel/yama/ptrace_scope', ignoring: Read-only file system

So it seems the condition is correct, I guess something to be aware of if we containerize services that apply sysctl using the ansible modules, because the sysctl will not be persistent across container reboots.

One example where this has major impact is for people attempting to containerize haproxy, since it means the nonlocal_bind sysctl will not be persistent across boots and it can cause unpredictable behavior with keepalived/haproxy.

Changed in openstack-ansible:
status: New → Confirmed
importance: Undecided → Critical
assignee: nobody → Jean-Philippe Evrard (jean-philippe-evrard)

Bringing this up again to my eyes. Still not the chance to work on ensuring keepalived can run in containers. Shouldn't be that hard -- changing profile if need be.

Logan V (loganv) wrote :

Yeah my workaround is a "sysctl-container.service" template:
http://paste.openstack.org/show/609686/

Tom Cameron (drdabbles) wrote :

My initial "gut feeling" on this is that this is how LXC containers are intended to work. Without digging into each value set here, a container doesn't have a kernel and so some of these values need to be set on the parent host to take effect.

The condition "ConditionPathIsReadWrite" is failing because in fact that mountpoint is mounted read-only. I've tested this on a brand new OSA install and get the same result. In fact, the file "/proc/sys/net/ipv4/tcp_syncookies" doesn't exist at all in the container. This makes sense because there is no kernel inside the container to adjust syncookie behavior for.

I'm tempted to suggest that we close this bug, because it's working as intended.

Tom Cameron (drdabbles) wrote :

Ok, after discussing a bit further with Evan C, I am not clearly or accurately describing what's potentially going on. In general, though, certain sysctl items only make sense on the host and not inside containers. The syncookies example stands, because only the kernel runs the TCP stack and any change to that value will impact everything running on the host.

But, the reason some of these values do not appear in /proc/sys tree is most likely due to the templates used to create these containers. It is possible to hand craft a container that displays the entire contents of the host's /proc/sys structure. However, there may well be code within the kernel components for namespaces and cgroups that prevent a container from writing to specific values still.

So my initial "gut feeling" was right, but for the wrong reason. :)

Logan V (loganv) wrote :

Yep there are a lot of host-specific sysctls that cannot be set within the container, and will error out as shown above if they affect the host, not only the container.

There are also sysctls that can be validly set within the container, such as net.ipv4.ip_nonlocal_bind. nonlocal_bind affects only the container it is set in, does not throw an error, and works exactly as intended.

It is understandable that systemd and/or the container template is cautious about applying sysctls in container, but completely opting out of any sysctl application, even when valid sysctls are set using standard sysctl persistence methods seems like a valid bug to me.

The problem isn't that I can't sysctl set the nonlocal_bind setting. My problem is that the OS does not allow me to do so using normal sysctl persistence methods.

Fix proposed to branch: master
Review: https://review.openstack.org/494419

Changed in openstack-ansible:
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/494419
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-lxc_container_create/commit/?id=bb76ea23f5682242df66648c5afe0c34c7c49988
Submitter: Jenkins
Branch: master

commit bb76ea23f5682242df66648c5afe0c34c7c49988
Author: Jean-Philippe Evrard <email address hidden>
Date: Thu Aug 17 10:29:59 2017 +0000

    Ensure that sysctl can be applied on containers

    Some sysctl can be applied to containers, so we add a test
    to prove our containers can do it.

    Change-Id: I40e2f0af00d6d763efcbb07306791d3cd3feff0d
    Fixes-Bug: #1685677

Changed in openstack-ansible:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/496335
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-lxc_container_create/commit/?id=91aa3046f86ca5fe51cb335c3251053eae54ff94
Submitter: Jenkins
Branch: stable/pike

commit 91aa3046f86ca5fe51cb335c3251053eae54ff94
Author: Jean-Philippe Evrard <email address hidden>
Date: Thu Aug 17 10:29:59 2017 +0000

    Ensure that sysctl can be applied on containers

    Some sysctl can be applied to containers, so we add a test
    to prove our containers can do it.

    Change-Id: I40e2f0af00d6d763efcbb07306791d3cd3feff0d
    Fixes-Bug: #1685677
    (cherry picked from commit bb76ea23f5682242df66648c5afe0c34c7c49988)

tags: added: in-stable-pike

Change abandoned by Jean-Philippe Evrard (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/576830

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers