Confined processes inside container cannot fully access host pty device passed in by lxc exec

Bug #1641236 reported by Tyler Hicks
68
This bug affects 30 people
Affects Status Importance Assigned to Milestone
apparmor (Ubuntu)
Undecided
Unassigned
lxd (Ubuntu)
Undecided
Unassigned

Bug Description

Now that AppArmor policy namespaces and profile stacking is in place, I noticed odd stdout buffering behavior when running confined processes via lxc exec. Much more data stdout data is buffered before getting flushed when the program is confined by an AppArmor profile inside of the container.

I see that lxd is calling openpty(3) in the host environment, using the returned fd as stdout, and then executing the command inside of the container. This results in an AppArmor denial because the file descriptor returned by openpty(3) originates outside of the namespace used by the container.

The denial is likely from glibc calling fstat(), from inside the container, on the file descriptor associated with stdout to make a decision on how much buffering to use. The fstat() is denied by AppArmor and glibc ends up handling the buffering differently than it would if the fstat() would have been successful.

Steps to reproduce (using an up-to-date 16.04 amd64 VM):

Create a 16.04 container
$ lxc launch ubuntu-daily:16.04 x

Run tcpdump in one terminal and generate traffic in another terminal (wget google.com)
$ lxc exec x -- tcpdump -i eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
<Packet dump>
47 packets captured
48 packets received by filter
1 packet dropped by kernel
<ctrl-c>

Note that everything above <Packet dump> was printed immediately because it was printed to stderr. <Packet dump>, which is printed to stdout, was not printed until you pressed ctrl-c and the buffers were flushed thanks to the program terminating. Also, this AppArmor denial shows up in the logs:

audit: type=1400 audit(1478902710.025:440): apparmor="DENIED" operation="getattr" info="Failed name lookup - disconnected path" error=-13 namespace="root//lxd-x_<var-lib-lxd>" profile="/usr/sbin/tcpdump" name="dev/pts/12" pid=15530 comm="tcpdump" requested_mask="r" denied_mask="r" fsuid=165536 ouid=165536

Now run tcpdump unconfined and take note that <Packet dump> is printed immediately, before you terminate tcpdump. Also, there are no AppArmor denials.
$ lxc exec x -- aa-exec -p unconfined -- tcpdump -i eth0
...

Now run tcpdump confined but in lxc exec's non-interactive mode and note that <Package dump> is printed immediately and no AppArmor denials are present. (Looking at the lxd code in lxd/container_exec.go, openpty(3) is only called in interactive mode)
$ lxc exec x --mode=non-interactive -- tcpdump -i eth0
...

Applications that manually call fflush(stdout) are not affected by this as manually flushing stdout works fine. The problem seems to be caused by glibc not being able to fstat() the /dev/pts/12 fd from the host's namespace.

Revision history for this message
Tyler Hicks (tyhicks) wrote :

There's currently no way in the AppArmor policy language to allow the getattr operation on the passed in /dev/pts/12 file. The typical workaround of adding the attach_disconnected flag to the profile does not work here because *every* AppArmor profile inside of the container would need that flag.

John Johansen has an AppArmor feature thought-out that would allow the policy language to allow this fd passing between namespaces but it is a sizeable feature and is not on the immediate roadmap.

I haven't had a chance to think it through very much but I'm curious if the LXD developers have any ideas on how this can be solved in LXD. Maybe it is possible to call openpty() from inside the container's namespace? I'm not sure if that would work or if it is safe to do but maybe it is worth investigating.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Getting openpty called in the container would solve a lot of problems for us but it's not possible to do in a safe way as it'd effectively rely on the container's filesystem which the container user can change or fake at will, allowing for attacks on the host's C library and LXD itself.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Marking the LXD side of this as Invalid since there's unfortunately nothing we can really do about this.

Changed in lxd (Ubuntu):
status: New → Invalid
Revision history for this message
Christian Brauner (cbrauner) wrote : Re: [Bug 1641236] Re: Confined processes inside container cannot fully access host pty device passed in by lxc exec

I've reproduced this on a fresh standard xenial instance with LXD
2.0.8 and also on a xenial instance with a patched glibc that reports
ENODEV on ttyname{_r}() on a pty fd that does not exist:

root@x:~# ./enodev_on_pty_in_different_namespace
ttyname(): The pty device might exist in a different namespace: No such device
ttyname_r(): The pty device might exist in a different namespace: No such device

Revision history for this message
Christian Brauner (cbrauner) wrote :

On Tue, Jan 31, 2017 at 11:34:43AM +0100, Christian Brauner wrote:
> I've reproduced this on a fresh standard xenial instance with LXD
> 2.0.8 and also on a xenial instance with a patched glibc that reports
> ENODEV on ttyname{_r}() on a pty fd that does not exist:
>
> root@x:~# ./enodev_on_pty_in_different_namespace
> ttyname(): The pty device might exist in a different namespace: No such device
> ttyname_r(): The pty device might exist in a different namespace: No such device

So to make this a little more elaborate:
- I managed to reproduce this with an unpatched glibc inside and outside the
  container just like @Tyler outlined.
- I managed to reproduce this with a patched glibc inside the container and an
  unpatched glibc outside the container.
- I managed to reproduce this with a patched glibc inside and outside the
  container.

So a patched glibc which returns ENODEV in case a symlink like /proc/self/fd/0
points to a pts device that lives in another namespace does not improve the
situation. The problem that @Tyler outlined still exists.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in apparmor (Ubuntu):
status: New → Confirmed
Revision history for this message
Thomas Parrott (tomparrott) wrote :

I've been able to re-create this using fresh install of Ubuntu 18.04 without using LXC or LXD, but just using network namespaces.

Setup 2 namespaces with IPVLAN:

ip netns add ns1
ip link add name ipv1 link enp0s3 type ipvlan mode l3s
ip link set dev ipv1 netns ns1
ip netns exec ns1 ip addr add 10.1.20.252/32 dev ipv1
ip netns exec ns1 ip link set ipv1 up
ip netns exec ns1 ip link set lo up
ip netns exec ns1 ip -4 r add default dev ipv1

ip netns add ns2
ip link add name ipv2 link enp0s3 type ipvlan mode l3s
ip link set dev ipv2 netns ns2
ip netns exec ns2 ip addr add 10.1.20.253/32 dev ipv2
ip netns exec ns2 ip link set ipv2 up
ip netns exec ns2 ip link set lo up
ip netns exec ns2 ip -4 r add default dev ipv2

Enter namespace 1 and start a ping to other namespace:

sudo ip netns exec ns1 ping 10.1.20.253

Then run tcpdump in namespace 2 listening for all packets without DNS resolution:

sudo ip netns exec ns2 tcpdump -i any -nn

This doesn't output any captured packets.

However running tcpdump with -l (Make stdout line buffered) does help:

sudo ip netns exec ns2 tcpdump -i any -nn -l

Revision history for this message
poobalan.arumugam aka murphy (poobalan-arumugam) wrote :

This affects Ubuntu 18.04 LXD containers as well.
As per previous mentions for tcpdump:
a) using script does not change anything
b) connecting via ssh and not lxc exec has no effect
c) disabling apparmour for tcpdump does work:
i.e.

/bin/ln -s /etc/apparmor.d/usr.sbin.tcpdump /etc/apparmor.d/disable/
/sbin/apparmor_parser -R /etc/apparmor.d

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers