lxc-start enters uninterruptible sleep

Bug #1196295 reported by Pavel Bennett
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Confirmed
High
Unassigned
linux (Ubuntu)
Incomplete
High
Unassigned
lxc (Ubuntu)
Invalid
High
Unassigned

Bug Description

After running and terminating around 6000 containers overnight, something happened on my box that is affecting every new LXC container I try to start. The DEBUG log file looks like:

      lxc-start 1372615570.399 WARN lxc_start - inherited fd 9
      lxc-start 1372615570.399 INFO lxc_apparmor - aa_enabled set to 1

      lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/302' (5/6)
      lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/303' (7/8)
      lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/304' (10/11)
      lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/305' (12/13)
      lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/306' (14/15)
      lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/307' (16/17)
      lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/308' (18/19)
      lxc-start 1372615570.399 DEBUG lxc_conf - allocated pty '/dev/pts/309' (20/21)
      lxc-start 1372615570.399 INFO lxc_conf - tty's configured
      lxc-start 1372615570.399 DEBUG lxc_start - sigchild handler set
      lxc-start 1372615570.399 INFO lxc_start - 'vm-59' is initialized
      lxc-start 1372615570.404 DEBUG lxc_start - Not dropping cap_sys_boot or watching utmp

      lxc-start 1372615570.404 INFO lxc_start - stored saved_nic #0 idx 12392 name vethP59

      lxc-start 1372615570.404 INFO lxc_conf - opened /home/x/vm/vm-59.hold as fd 25

It stops there. In 'ps faux', it looks like:

root 31621 0.0 0.0 25572 1272 ? D 14:06 0:00 \_ lxc-start -n vm-59 -f /tmp/tmp.fG6T6ERZpS -l DEBUG -o /home/x/lxcdebug/vm-59.txt -- /usr/sbin/dropbear -F -E -m

On a successful LXC run (prior to the server getting into this state), this hangs just before:

      lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/' (rootfs)
      lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/sys' (sysfs)
      lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/proc' (proc)
      lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/dev' (devtmpfs)
      lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/dev/pts' (devpts)
      lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/run' (tmpfs)
      lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/' (btrfs)
      lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/sys/fs/cgroup' (tmpfs)
      lxc-start 1372394092.208 DEBUG lxc_cgroup - checking '/sys/fs/cgroup/cpuset' (cgroup)
      lxc-start 1372394092.208 INFO lxc_cgroup - [1] found cgroup mounted at '/sys/fs/cgroup/cpuset',opts='rw,relatime,cpuset,clone_children'
      lxc-start 1372394092.208 DEBUG lxc_cgroup - get_init_cgroup: found init cgroup for subsys (null) at /

It looks like a resource leak, but I'm not yet sure of what that would be.

If it matters, I SIGKILL my lxc-start processes instead of using lxc-stop. Could that have any negative implications?

Oh, and cgroups had almost 6000 entries for VMs that are long dead (I'm guessing it's due to my SIGKILL). I've run cgclear and my /sys/fs/cgroup/*/ dirs are now totally empty, but the new containers still hang.
---
Architecture: amd64
DistroRelease: Ubuntu 13.04
MarkForUpload: True
Package: lxc 0.9.0-0ubuntu3.3
PackageArchitecture: amd64
ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
Uname: Linux 3.8.0-25-generic x86_64
UserGroups:

Revision history for this message
Pavel Bennett (pavben) wrote :

Also, in dmesg:

[54545.873460] unregister_netdevice: waiting for lo to become free. Usage count = 1
[54556.103535] unregister_netdevice: waiting for lo to become free. Usage count = 1
[54566.333609] unregister_netdevice: waiting for lo to become free. Usage count = 1
[54576.563664] unregister_netdevice: waiting for lo to become free. Usage count = 1
[54586.793749] unregister_netdevice: waiting for lo to become free. Usage count = 1

I've modified my code to use lxc-stop as the cgroups do indeed leak otherwise. What's strange is that it kept happening after clearing cgroups, so perhaps it's something else.

Revision history for this message
Pavel Bennett (pavben) wrote :

Some basic environment details. I can post more if requested.

Ubuntu Server 13.04 64-bit

$ uname -r
3.8.0-25-generic

$ dpkg -l | grep lxc
ii liblxc0 0.9.0-0ubuntu3.3 amd64 Linux Containers userspace tools (library)
ii lxc 0.9.0-0ubuntu3.3 amd64 Linux Containers userspace tools

lxc.network.type = phys
lxc.network.flags = up
lxc.network.link = vethP0
lxc.network.ipv4 = 10.1.0.1
lxc.network.ipv4.gateway = 10.1.0.0
lxc.network.name = eth0

Prior to running lxc, I set up the interface pair as follows:

ip link add name vethH0 type veth peer name vethP0
ifconfig vethH0 10.1.0.0/31 up
route add -host 10.1.0.1 dev vethH0

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1196295] [NEW] lxc-start enters uninterruptible sleep

Quoting Pavel Bennett (<email address hidden>):
> If it matters, I SIGKILL my lxc-start processes instead of using lxc-
> stop. Could that have any negative implications?

Yes, the lxc-start process itself is not the container init, so killing
it will prevent lxc from doing cleanup. Please try to reproduce with
lxc-stop.

 status: incomplete
 importance: high

Changed in lxc (Ubuntu):
importance: Undecided → High
status: New → Incomplete
Revision history for this message
Pavel Bennett (pavben) wrote :

Reproduced even with lxc-stop.

dmesg:

[178420.689704] unregister_netdevice: waiting for lo to become free. Usage count = 1
[178430.919783] unregister_netdevice: waiting for lo to become free. Usage count = 1
[178441.149854] unregister_netdevice: waiting for lo to become free. Usage count = 1
[178451.379920] unregister_netdevice: waiting for lo to become free. Usage count = 1

ps:

root 31536 0.0 0.0 25572 1276 ? D 06:35 0:00 lxc-start -n vm-106 -f /tmp/tmp.IsBlHPIMWw -l DEBUG -o /home/x/lxcdebug/vm-106.txt -- /usr/sbin/dropbear -F -E -m

lxcdebug/vm-106.txt:

      lxc-start 1372761341.653 WARN lxc_start - inherited fd 9
      lxc-start 1372761341.673 INFO lxc_apparmor - aa_enabled set to 1

      lxc-start 1372761341.674 DEBUG lxc_conf - allocated pty '/dev/pts/845' (5/6)
      lxc-start 1372761341.674 DEBUG lxc_conf - allocated pty '/dev/pts/846' (7/8)
      lxc-start 1372761341.674 DEBUG lxc_conf - allocated pty '/dev/pts/847' (10/11)
      lxc-start 1372761341.674 DEBUG lxc_conf - allocated pty '/dev/pts/848' (12/13)
      lxc-start 1372761341.674 DEBUG lxc_conf - allocated pty '/dev/pts/849' (14/15)
      lxc-start 1372761341.674 DEBUG lxc_conf - allocated pty '/dev/pts/850' (16/17)
      lxc-start 1372761341.674 DEBUG lxc_conf - allocated pty '/dev/pts/851' (18/19)
      lxc-start 1372761341.674 DEBUG lxc_conf - allocated pty '/dev/pts/852' (20/21)
      lxc-start 1372761341.674 INFO lxc_conf - tty's configured
      lxc-start 1372761341.674 DEBUG lxc_start - sigchild handler set
      lxc-start 1372761341.674 INFO lxc_start - 'vm-106' is initialized
      lxc-start 1372761341.675 DEBUG lxc_start - Not dropping cap_sys_boot or watching utmp

      lxc-start 1372761341.676 DEBUG lxc_conf - mac address of host interface 'vethfS0zzk' changed to private fe:d6:f6:ea:af:ba
      lxc-start 1372761341.676 DEBUG lxc_conf - instanciated veth 'vethfS0zzk/vethTtYxwJ', index is '23962'
      lxc-start 1372761341.676 INFO lxc_conf - opened /home/x/vm/vm-106.hold as fd 25
-----------------------------------------

cgroups are clean -- absolutely no vm-X folders under /sys/fs/cgroup/*/lxc/

lxc.network.type is now 'veth' as I've moved off of using 'phys', but the result is the same.

This happened after lxc-start / lxc-stop somewhere between 7k and 15k times. Similar number as reported last time this happened, which really suggests a leak of some sort.

I have the box in this state right now.

What other details could be helpful?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1196295] Re: lxc-start enters uninterruptible sleep

Thanks for the info

 affects: linux

Revision history for this message
Pavel Bennett (pavben) wrote :
Download full text (6.4 KiB)

This should help.

kern.log:

Jul 2 05:41:32 server1 kernel: [136565.201601] device vethbJ4JsM left promiscuous mode
Jul 2 05:41:32 server1 kernel: [136565.201603] vmbr: port 5(vethbJ4JsM) entered disabled state
Jul 2 05:41:38 server1 kernel: [136570.551496] vmbr: port 2(veth49SiBX) entered forwarding state
Jul 2 05:41:38 server1 kernel: [136570.971787] device vethgEUinJ entered promiscuous mode
Jul 2 05:41:38 server1 kernel: [136570.971858] IPv6: ADDRCONF(NETDEV_UP): vethgEUinJ: link is not ready
Jul 2 05:41:39 server1 kernel: [136571.574489] vmbr: port 3(vethdl0Frj) entered forwarding state
Jul 2 05:41:42 server1 kernel: [136575.242996] unregister_netdevice: waiting for lo to become free. Usage count = 1
Jul 2 05:41:42 server1 kernel: [136575.282886] vmbr: port 4(vethwdXg5p) entered forwarding state
Jul 2 05:41:44 server1 kernel: [136576.945295] vmbr: port 6(vethxUZwwG) entered forwarding state
Jul 2 05:41:47 server1 kernel: [136580.142190] vmbr: port 7(vethpuofJs) entered forwarding state
Jul 2 05:41:53 server1 kernel: [136585.473065] unregister_netdevice: waiting for lo to become free. Usage count = 1
Jul 2 05:42:03 server1 kernel: [136595.703141] unregister_netdevice: waiting for lo to become free. Usage count = 1
Jul 2 05:42:13 server1 kernel: [136605.933214] unregister_netdevice: waiting for lo to become free. Usage count = 1
Jul 2 05:42:23 server1 kernel: [136616.163283] unregister_netdevice: waiting for lo to become free. Usage count = 1
Jul 2 05:42:33 server1 kernel: [136626.237470] unregister_netdevice: waiting for lo to become free. Usage count = 1
Jul 2 05:42:34 server1 kernel: [136627.221675] device vethkKcWdk entered promiscuous mode
Jul 2 05:42:34 server1 kernel: [136627.221727] IPv6: ADDRCONF(NETDEV_UP): vethkKcWdk: link is not ready
Jul 2 05:42:38 server1 kernel: [136630.800271] device veth5guMb3 entered promiscuous mode
Jul 2 05:42:38 server1 kernel: [136630.800341] IPv6: ADDRCONF(NETDEV_UP): veth5guMb3: link is not ready
Jul 2 05:42:44 server1 kernel: [136636.471570] unregister_netdevice: waiting for lo to become free. Usage count = 1
Jul 2 05:42:54 server1 kernel: [136646.701652] unregister_netdevice: waiting for lo to become free. Usage count = 1
Jul 2 05:43:04 server1 kernel: [136656.931720] unregister_netdevice: waiting for lo to become free. Usage count = 1
Jul 2 05:43:14 server1 kernel: [136667.161755] unregister_netdevice: waiting for lo to become free. Usage count = 1
Jul 2 05:43:25 server1 kernel: [136677.391864] unregister_netdevice: waiting for lo to become free. Usage count = 1
Jul 2 05:43:34 server1 kernel: [136686.721504] device vethOk6Soj entered promiscuous mode
Jul 2 05:43:34 server1 kernel: [136686.721584] IPv6: ADDRCONF(NETDEV_UP): vethOk6Soj: link is not ready
Jul 2 05:43:35 server1 kernel: [136687.621938] unregister_netdevice: waiting for lo to become free. Usage count = 1
Jul 2 05:43:38 server1 kernel: [136690.857458] device vethjxu2iI entered promiscuous mode
Jul 2 05:43:38 server1 kernel: [136690.857565] IPv6: ADDRCONF(NETDEV_UP): vethjxu2iI: link is not ready
Jul 2 05:43:45 server1 kernel: [136697.852009] unregister_netdevice: waiting for lo to become free. Usage count...

Read more...

Changed in linux:
importance: Undecided → High
Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key raring
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1196295

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux:
status: New → Incomplete
Pavel Bennett (pavben)
tags: added: apport-collected
description: updated
Pavel Bennett (pavben)
Changed in lxc (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Pavel Bennett (pavben) wrote :

Attaching a better apport file after installing the missing dependency. I will hide the ones from earlier as this will contain the same data and more.

Revision history for this message
Pavel Bennett (pavben) wrote :

Changing to Confirmed as per instructions in comment #7

Changed in linux:
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.10 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.10-saucy/

Pavel Bennett (pavben)
tags: added: kernel-bug-exists-upstream
Revision history for this message
Pavel Bennett (pavben) wrote :

Managed to repro with v3.10-saucy last night.

What do you guys suspect it could be?

I'm keeping the server in this state for now if you'd like me to gather some data.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for testing the upstream kernel. One additional question. Was there a previous kernel version that did not exhibit this issue? It would be good to know if this is a regression. If it is, we can perform a bisect to identify the commit that introduced it.

Revision history for this message
Pavel Bennett (pavben) wrote :

The last one I'm aware of that did not exhibit this issue was 3.5.0-27. I wish I had a simpler repro though, since on our system it takes 10-15 hours of heavy processing to hit the uninterruptible sleeps.

Could it be tracked by looking at the state of the OS? Every new lxc-start ends up hanging after it happens.

Revision history for this message
Stefan Bader (smb) wrote :

This looks to be the third case we hit with the same symptom (and likely for yet another reason). The complicating issue is that lxc containers make use of net namespaces and if those are released, references to I think the netdevice structures, are temporarily moved over to the loopback device. So anything going wrong with respect of references will show up like what you see (to add non mental note: bug 1021471 and bug 1065434).

Since you say it takes 10-15hrs to hit it feels like this could again be a case of something rarely going on when the container is shut down which then causes a reference to not being dropped. Right now the range between 3.5.0-27 (maybe?) and 3.8.0-25 is quite vast. And at least up to 3.10 we can assume it has not been detected/fixed. So unlikely something that will be easy to spot. I know it is a lot of effort, but it would be really important to narrow down the version delta. If possible, I would suggest to use the mainline kernels to start of a rough manual bisection.

http://kernel.ubuntu.com/~kernel-ppa/mainline/

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Chris J Arges (arges) wrote :

I can achieve a similar hung task by using the following script:

#!/bin/bash -x

NUM=7999

for (( i=0; i<${NUM}; i++ ))
do
  ip netns add foobar$i
done

for (( i=0; i<${NUM}; i++ ))
do
  ip netns delete foobar$i
done

for (( i=0; i<${NUM}; i++ ))
do
  ip netns add foobar$i
done

for (( i=0; i<${NUM}; i++ ))
do
  ip netns delete foobar$i
done

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@arges

does the kernel patch set you sent to kernel-team@ yesterday help the testcase you put in comment #22?

Someone just pointed out he got this

[54545.873460] unregister_netdevice: waiting for lo to become free. Usage count = 1

in a 3.8 kernel when using macvlan in a container.

It seems to me that the testcase in comment #22, plus one or two more (starting veth and macvlan containers and checking dmesg), at this point belong in a kernel regression testsuite, bc it seems to keep coming back. I don't know which testsuite would be best for this.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Marking as invalid for LXC since this is a kernel bug.

Changed in lxc (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.