Re/starting an lxc container corrupts all network namespaces on the same physical host

Bug #1401148 reported by James Page
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned
lxc (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Context: Neutron gateway north/south routing server which manages a large number of network namespaces; also hosts a few LXC containers for misc lightweight control plane services.

Problem: If I restart one of the lxc containers, all of the namespaces get corrupted in someway; attempting to exec anything in any namespace fails with:

seting the network namespace "qrouter-4b575c81-39bb-439f-81e1-e59e3759a287" failed: Invalid argument
seting the network namespace "qrouter-1f5e26df-f8c5-4246-9485-3f9df8e39c40" failed: Invalid argument
seting the network namespace "qrouter-c3bf179e-9532-43f9-88af-752b66592cd6" failed: Invalid argument
seting the network namespace "qrouter-3d4550ca-4de6-44e3-90b5-1b60c3d58ed1" failed: Invalid argument
seting the network namespace "qrouter-4fc4c3c2-68bf-4954-8b32-d47d8d84086e" failed: Invalid argument
seting the network namespace "qrouter-0890d9ea-f0c8-4e69-bf1a-4896213a82a0" failed: Invalid argument
seting the network namespace "qrouter-0f7e0655-f84b-4aaa-82aa-75f01a59411e" failed: Invalid argument

I also see:

Dec 10 15:16:00 cofgod kernel: [ 4604.274359] type=1400 audit(1418224560.675:132): apparmor="DENIED" operation="mount" info="failed type match" error=-13 profile="/usr/bin/lxc-start" name="/run/netns/qdhcp-0ba77ab2-b3ee-4752-88af-b19313c10f9d/" pid=8790 comm="lxc-start" flags="rw, slave"
Dec 10 15:16:00 cofgod kernel: [ 4604.274405] type=1400 audit(1418224560.675:134): apparmor="DENIED" operation="mount" info="failed type match" error=-13 profile="/usr/bin/lxc-start" name="/run/netns/qdhcp-25006453-2caa-4aa4-bdeb-e4822dc700d6/" pid=8790 comm="lxc-start" flags="rw, slave"
Dec 10 15:16:00 cofgod kernel: [ 4604.274436] type=1400 audit(1418224560.675:136): apparmor="DENIED" operation="mount" info="failed type match" error=-13 profile="/usr/bin/lxc-start" name="/run/netns/qdhcp-2fec74e8-d507-4650-beb4-8da459ea0039/" pid=8790 comm="lxc-start" flags="rw, slave"
Dec 10 15:16:00 cofgod kernel: [ 4604.274451] type=1400 audit(1418224560.675:137): apparmor="DENIED" operation="mount" info="failed type match" error=-13 profile="/usr/bin/lxc-start" name="/run/netns/qdhcp-33d8fa40-c158-4377-bc8f-d252e38d4943/" pid=8790 comm="lxc-start" flags="rw, slave"
Dec 10 15:16:00 cofgod kernel: [ 4604.274466] type=1400 audit(1418224560.675:138): apparmor="DENIED" operation="mount" info="failed type match" error=-13 profile="/usr/bin/lxc-start" name="/run/netns/qdhcp-394517c0-e48a-43e7-8778-96c601607733/" pid=8790 comm="lxc-start" flags="rw, slave"
Dec 10 15:16:00 cofgod kernel: [ 4604.274482] type=1400 audit(1418224560.675:139): apparmor="DENIED" operation="mount" info="failed type match" error=-13 profile="/usr/bin/lxc-start" name="/run/netns/qdhcp-41e21850-decf-49f8-97fb-cbb3aa5932e3/" pid=8790 comm="lxc-start" flags="rw, slave"
Dec 10 15:16:00 cofgod kernel: [ 4604.274497] type=1400 audit(1418224560.675:140): apparmor="DENIED" operation="mount" info="failed type match" error=-13 profile="/usr/bin/lxc-start" name="/run/netns/qrouter-e9837293-c017-4d85-a601-cae5e83719a2/" pid=8790 comm="lxc-start" flags="rw, slave"

In the kern.log

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: lxc 1.0.6-0ubuntu0.1
ProcVersionSignature: Ubuntu 3.13.0-35.62-generic 3.13.11.6
Uname: Linux 3.13.0-35-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.6
Architecture: amd64
Date: Wed Dec 10 15:24:45 2014
SourcePackage: lxc
UpgradeStatus: No upgrade log present (probably fresh install)
defaults.conf:
 lxc.network.type = veth
 lxc.network.link = lxcbr0
 lxc.network.flags = up
 lxc.network.hwaddr = 00:16:3e:xx:xx:xx

Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :

To reproduce:

sudo lxc-create --name test -t ubuntu-cloud
sudo ip netns add test
sudo ip netns exec test ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
sudo lxc-start -d --name test
sudo ip netns exec test-tests ip addr
seting the network namespace "test-tests" failed: Invalid argument

Revision history for this message
James Page (james-page) wrote :

Confirmed on utopic as well.

Revision history for this message
James Page (james-page) wrote :

 sudo ip netns exec test ip addr

Revision history for this message
James Page (james-page) wrote :

Confirmed on vivid as well.

tags: added: landscape
Revision history for this message
Stefan Bader (smb) wrote :

I had assumed that "test-test" was a type and saw the same result after starting the container with "test", too. So somehow starting an lxc container seems to have an impact on netns. Not sure whether the apparmor message may relate which seems to trigger when lxc-start tries to mount /run/netns.

Revision history for this message
Stefan Bader (smb) wrote :

Hm, as a data-point. It seems for the testing one can set /usr/bin/lxc-start to complain mode:

aa-complain /usr/bin/lxc-start

and when I did that the test netns is still usable after lxc-start.

Revision history for this message
John Johansen (jjohansen) wrote :

Can you please attach the output of

  apparmor_parser -p /etc/apparmor.d/usr.bin.lxc-start

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1401148

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Stefan Bader (smb) wrote :

So for now I added also a task for the kernel, though the truth (if such a thing exists) could be somewhere between. Serge, Stephane, what we probably need to figure out is what exactly lxc-start tries to get done when slave mounting /run/netns. And somehow it might be possible that it needs improvement for the case that this is denied or fails. Looking at it from the outside it feels like going on assuming it got its own space but actually continuing to use the host space.
The other thing would be that this sound like lxc-start would require a rule to actually allow it to do that mount of /run/netns.

Revision history for this message
Stefan Bader (smb) wrote :

Stop the bot.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Stéphane Graber (stgraber) wrote :

so I think it's some systemd handling which does that. LXC unshares the mnt namespace which gets it a copy of the host's, then it's doing some magic (rprivate I believe) to get things working under systemd, then mounts what it needs, unmounts everything else and pivot_root.

lxc itself has no code to deal with /run/netns, so it's not special casing it.

Revision history for this message
Stefan Bader (smb) wrote :

When stracing lxc-start one of the sub-processes is doing the access. This is the strace of that sub-process.

Revision history for this message
Stefan Bader (smb) wrote :

This is the output of "apparmor_parser -p /etc/apparmor.d/usr.bin.lxc-start" on Vivid with 3.16 kernel.

Revision history for this message
Stefan Bader (smb) wrote :

lxc-start.strace.3093:clone(child_stack=0x7fff7fbc0290, flags=CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWPID|CLONE_NEWNET|SIGCHLD) = 3131
lxc-start.strace.3093:open("/proc/3131/ns/net", O_RDONLY) = 16
lxc-start.strace.3093:waitid(P_PID, 3131, {}, WNOHANG|WEXITED|WNOWAIT, NULL) =

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Is this only happening when systemd is in the container, or when systemd is on the host?

Revision history for this message
Stefan Bader (smb) wrote :

I would have assumed systemd is on neither. Since it seems to be the same all the way since Trusty (at least).

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

The only way I can get this to work is to add

"mount,"

to /etc/apparmor.d/abstractions/lxc/start-container

If I add something like

"mount options=slave"
"remount options=slave"

that does not suffice.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

It appears that as tyhicks pointed out this is a dup of bug 1350947.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

hah, as pointed out in comment #4 of that bug. Marking this as a dup

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

James if you'd like to increase the priority of bug 1350947 please do so.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in lxc (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.