clone() hang when creating new network namespace (dmesg show unregister_netdevice: waiting for lo to become free. Usage count = 2)

Bug #1021471 reported by Iain Lane on 2012-07-05
60
This bug affects 12 people
Affects Status Importance Assigned to Milestone
Linux
Confirmed
High
linux (Ubuntu)
High
Stefan Bader
Precise
Medium
Chris J Arges
Quantal
High
Stefan Bader

Bug Description

SRU Justification:

Impact:
    When creating new network namespace dmesg can show the following
    unregister_netdevice: waiting for lo to become free. Usage count = 1

Fix:
    Stefan Bader's SAUCE patch has fixed this for Quantal:
    UBUNTU: SAUCE: net/ipv4: Always flush route cache on unregister batch call

Testcase:
    The sourcecode found here:
    https://lists.debian.org/debian-kernel/2012/05/msg00494.html
    can be compiled and run as follows:

    sudo ./reproducer
    #ctrl+c
    sudo ./reproducer
    #wait for a while
    dmesg | grep unregister

--

I'm not sure how I triggered this. I've been moving around between networks and suspending/resuming all day.

Earlier in this boot I successfully used a container (start, networking and stop). I came to start the same one later and noticed that it didn't come up. Trying to attach to the console with lxc-console informed me that it wasn't running. I then saw suspicious content in dmesg:

[25800.412234] INFO: task lxc-start:25817 blocked for more than 120 seconds.
[25800.412243] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[25800.412254] lxc-start D ffff88013fd13980 0 25817 1 0x00000000
[25800.412266] ffff880007b43cc0 0000000000000086 ffff88003ba4c500 ffff880007b43fd8
[25800.412275] ffff880007b43fd8 ffff880007b43fd8 ffff880134c65c00 ffff88003ba4c500
[25800.412284] 000080d0ffffffff ffffffff81ca7c00 ffff88003ba4c500 ffffffff81ca7c04
[25800.412288] Call Trace:
[25800.412306] [<ffffffff81673759>] schedule+0x29/0x70
[25800.412313] [<ffffffff81673a1e>] schedule_preempt_disabled+0xe/0x10
[25800.412323] [<ffffffff81672537>] __mutex_lock_slowpath+0xd7/0x150
[25800.412331] [<ffffffff8167200a>] mutex_lock+0x2a/0x50
[25800.412340] [<ffffffff8155ede1>] copy_net_ns+0x71/0x100
[25800.412350] [<ffffffff8107adfb>] create_new_namespaces+0xdb/0x190
[25800.412357] [<ffffffff8107afec>] copy_namespaces+0x8c/0xd0
[25800.412367] [<ffffffff81050142>] copy_process.part.22+0x902/0x1520
[25800.412375] [<ffffffff81050ee5>] do_fork+0x135/0x390
[25800.412385] [<ffffffff8116db40>] ? kmem_cache_free+0x20/0x100
[25800.412395] [<ffffffff8118c6b3>] ? putname+0x33/0x50
[25800.412402] [<ffffffff811811cc>] ? do_sys_open+0x16c/0x200
[25800.412410] [<ffffffff8101c238>] sys_clone+0x28/0x30
[25800.412418] [<ffffffff8167cbf3>] stub_clone+0x13/0x20
[25800.412424] [<ffffffff8167c8e9>] ? system_call_fastpath+0x16/0x1b
[25806.312385] unregister_netdevice: waiting for lo to become free. Usage count = 1

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: linux-image-generic 3.5.0.3.3
ProcVersionSignature: Ubuntu 3.5.0-2.2-generic 3.5.0-rc4
Uname: Linux 3.5.0-2-generic x86_64
NonfreeKernelModules: nvidia wl
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.25.
ApportVersion: 2.2.5-0ubuntu2
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: NVidia [HDA NVidia], device 0: Cirrus Analog [Cirrus Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: laney 2787 F.... pulseaudio
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Card0.Amixer.info:
 Card hw:0 'NVidia'/'HDA NVidia at 0xd3480000 irq 22'
   Mixer name : 'Nvidia MCP89 HDMI'
   Components : 'HDA:10134206,106b0d00,00100301 HDA:10de000c,10de0101,00100200'
   Controls : 37
   Simple ctrls : 13
Date: Thu Jul 5 21:26:08 2012
HibernationDevice: RESUME=UUID=1c5b3f2c-2c89-4fa1-9ed8-0e238de8fe47
InstallationMedia: Ubuntu 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100729)
MachineType: Apple Inc. MacBookPro7,1
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.5.0-2-generic root=UUID=2228fdfe-3834-40b2-b7b4-efea7463e3c1 ro quiet splash reboot=pci vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-2-generic N/A
 linux-backports-modules-3.5.0-2-generic N/A
 linux-firmware 1.82
SourcePackage: linux
UpgradeStatus: Upgraded to quantal on 2012-01-13 (173 days ago)
dmi.bios.date: 03/25/10
dmi.bios.vendor: Apple Inc.
dmi.bios.version: MBP71.88Z.0039.B05.1003251322
dmi.board.name: Mac-F222BEC8
dmi.board.vendor: Apple Inc.
dmi.chassis.type: 10
dmi.chassis.vendor: Apple Inc.
dmi.chassis.version: Mac-F222BEC8
dmi.modalias: dmi:bvnAppleInc.:bvrMBP71.88Z.0039.B05.1003251322:bd03/25/10:svnAppleInc.:pnMacBookPro7,1:pvr1.0:rvnAppleInc.:rnMac-F222BEC8:rvr:cvnAppleInc.:ct10:cvrMac-F222BEC8:
dmi.product.name: MacBookPro7,1
dmi.product.version: 1.0
dmi.sys.vendor: Apple Inc.

Iain Lane (laney) wrote :
summary: - lxc-start no longer starts containers
+ lxc-start sometimes stops starting containers
Iain Lane (laney) on 2012-07-05
summary: - lxc-start sometimes stops starting containers
+ 'stuck on mutex_lock creating a new network namespace when starting a
+ container
summary: - 'stuck on mutex_lock creating a new network namespace when starting a
+ stuck on mutex_lock creating a new network namespace when starting a
container
Brad Figg (brad-figg) on 2012-07-05
Changed in linux (Ubuntu):
status: New → Confirmed

Can you tell us how to reproduce this issue?

From the dmesg kernel warning opps, I think it is not lxc/cgroups specific issue. Looks like lxc-start was blocked by some stuff for a long time. Is there any heavy workload on your system?

Thanks,
-Bryan

Changed in linux (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Bryan Wu (cooloney)
Iain Lane (laney) wrote :

Sorry, I don't yet have a recipe for reproducing it.

I did manage to get the system into the broken state again after I filed this bug report by suspending/resuming and starting/stopping/using containers as usual, but I can't trigger it on demand.

As for heavy load: not at the point it breaks. The main container I use is one I'm working on some Launchpad changes in. I often run the test suite inside it, and I guess that is rather heavy.

Bryan Wu (cooloney) wrote :

hmmm, that's very hard for us to analyze. We did meet a similar oops before because of the heavy workload and CFQ block IO scheduler. Could you a test for us? change your default Block IO scheduler from CFQ to deadline and run LXC as usual to verify this issue is gone. I'm just guess and hope this can do some help.

-Bryan

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Stéphane Graber (stgraber) wrote :
Download full text (6.6 KiB)

Not much luck reproducing at the moment with an up to date quantal, though running using the deadline scheduler with two containers rebooting in a loop, I eventually hit that:

Jul 19 07:22:34 lantea kernel: [46965.795778] ---[ end trace c212400a9b13d700 ]---
Jul 19 07:22:35 lantea kernel: [46965.809353] general protection fault: 0000 [#2] SMP
Jul 19 07:22:35 lantea kernel: [46965.812019] Modules linked in: veth ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables 8021q garp bridge stp llc snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm coretemp microcode snd_seq_midi snd_rawmidi psmouse serio_raw snd_seq_midi_event lpc_ich snd_seq snd_timer snd_seq_device i915 bonding rfcomm bnep bluetooth parport_pc ppdev mac_hid snd drm_kms_helper drm i2c_algo_bit soundcore snd_page_alloc video lp parport hid_generic usbhid hid r8169 floppy
Jul 19 07:22:35 lantea kernel: [46965.812019]
Jul 19 07:22:35 lantea kernel: [46965.812019] Pid: 11839, comm: initctl Tainted: G D 3.5.0-5-generic #5-Ubuntu /945GSE
Jul 19 07:22:35 lantea kernel: [46965.812019] EIP: 0060:[<c154bdf3>] EFLAGS: 00010286 CPU: 0
Jul 19 07:22:35 lantea kernel: [46965.812019] EIP is at unix_destruct_scm+0x53/0x90
Jul 19 07:22:35 lantea kernel: [46965.812019] EAX: 00000000 EBX: f71740c0 ECX: ffffffff EDX: 00000000
Jul 19 07:22:35 lantea kernel: [46965.812019] ESI: e0a828c8 EDI: f71740c0 EBP: e0a89adc ESP: e0a89abc
Jul 19 07:22:35 lantea kernel: [46965.812019] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Jul 19 07:22:35 lantea kernel: [46965.812019] CR0: 80050033 CR2: b7606fb8 CR3: 01968000 CR4: 000007e0
Jul 19 07:22:35 lantea kernel: [46965.812019] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jul 19 07:22:35 lantea kernel: [46965.812019] DR6: ffff0ff0 DR7: 00000400
Jul 19 07:22:35 lantea kernel: [46965.812019] Process initctl (pid: 11839, ti=e0a88000 task=f34e6580 task.ti=e0a88000)
Jul 19 07:22:35 lantea kernel: [46965.812019] Stack:
Jul 19 07:22:35 lantea kernel: [46965.812019] 00000000 ffffffff 00000000 00000000 00000000 00000000 00000000 f71740c0
Jul 19 07:22:35 lantea kernel: [46965.812019] e0a89ae8 c14c45d3 f71740c0 e0a89af4 c14c43d0 00000001 e0a89b0c c14c4486
Jul 19 07:22:35 lantea kernel: [46965.812019] c154bc6f 00000001 e0a828c8 f71740c0 e0a89b38 c154bc6f 00000000 e0a80ae0
Jul 19 07:22:35 lantea kernel: [46965.812019] Call Trace:
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14c45d3>] skb_release_head_state+0x43/0xc0
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14c43d0>] __kfree_skb+0x10/0x90
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14c4486>] kfree_skb+0x36/0x80
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c154bc6f>] ? unix_release_sock+0x13f/0x240
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c154bc6f>] unix_release_sock+0x13f/0x240
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c154bd8f>] unix_release+0x1f/0x30
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14bc4e0>] sock_release+0x20/0x70
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c14bc547>] sock_close+0x17/0x30
Jul 19 07:22:35 lantea kernel: [46965.812019] [<c114ff76>] fput+0xe6/0x210
Jul 19 07:22:35 lan...

Read more...

Stéphane Graber (stgraber) wrote :
Download full text (7.1 KiB)

Restarted the same test with the default I/O scheduler and after a few hours, got the same crash again:

Jul 19 16:58:20 lantea kernel: [14707.004394] general protection fault: 0000 [#1] SMP
Jul 19 16:58:20 lantea kernel: [14707.008026]
Jul 19 16:58:20 lantea kernel: [14707.008026] Pid: 20505, comm: dbus-daemon Not tainted 3.5.0-5-generic #5-Ubuntu /945GSE
Jul 19 16:58:20 lantea kernel: [14707.008026] EIP: 0060:[<c154d2fb>] EFLAGS: 00010286 CPU: 0
Jul 19 16:58:20 lantea kernel: [14707.008026] EIP is at unix_stream_recvmsg+0x4eb/0x680
Jul 19 16:58:20 lantea kernel: [14707.008026] EAX: 00000000 EBX: f28cc180 ECX: f18f1d40 EDX: ffffffff
Jul 19 16:58:20 lantea kernel: [14707.008026] ESI: 00000000 EDI: edd6c240 EBP: f18f1d68 ESP: f18f1ccc
Jul 19 16:58:20 lantea kernel: [14707.008026] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Jul 19 16:58:20 lantea kernel: [14707.008026] CR0: 80050033 CR2: b7702130 CR3: 318d6000 CR4: 000007e0
Jul 19 16:58:20 lantea kernel: [14707.008026] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jul 19 16:58:20 lantea kernel: [14707.008026] DR6: ffff0ff0 DR7: 00000400
Jul 19 16:58:20 lantea kernel: [14707.008026] Process dbus-daemon (pid: 20505, ti=f18f0000 task=f48dd8d0 task.ti=f18f0000)
Jul 19 16:58:20 lantea kernel: [14707.008026] Stack:
Jul 19 16:58:20 lantea kernel: [14707.008026] c15bfa3d f18f1cdc ffffffff edd6e688 f18f1d40 c14c419c f48dd8d0 f48dd8d0
Jul 19 16:58:20 lantea kernel: [14707.008026] 00000000 00000000 edd6c3f8 00000000 00000001 00000000 f18f1d7c edd6c288
Jul 19 16:58:20 lantea kernel: [14707.008026] 00000000 00000000 ec7ff500 f18f1f4c 00000000 edd6c420 00000000 f4acd400
Jul 19 16:58:20 lantea kernel: [14707.008026] Call Trace:
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c15bfa3d>] ? _raw_spin_lock_irqsave+0x2d/0x40
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c14c419c>] ? skb_queue_tail+0x3c/0x50
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c1289713>] ? aa_revalidate_sk+0x83/0x90
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c14bd54c>] sock_recvmsg+0xcc/0x100
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c115f7c0>] ? __pollwait+0xd0/0xd0
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c12c97b1>] ? _copy_from_user+0x41/0x60
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c14c79cf>] ? verify_iovec+0x3f/0xb0
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c14bd480>] ? sock_sendmsg_nosec+0xf0/0xf0
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c14bcc70>] __sys_recvmsg+0x110/0x1d0
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c14bd480>] ? sock_sendmsg_nosec+0xf0/0xf0
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c107d7cf>] ? trigger_load_balance+0x4f/0x1c0
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c1074d7a>] ? scheduler_tick+0xda/0x100
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c12c5038>] ? timerqueue_add+0x58/0xb0
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c1090255>] ? ktime_get+0x65/0xf0
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c102cbab>] ? lapic_next_event+0x1b/0x20
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c14be93b>] sys_recvmsg+0x3b/0x60
Jul 19 16:58:20 lantea kernel: [14707.008026] [<c14bee3b>] sys_socketcall+0x28b/0x2d0
Ju...

Read more...

Stéphane Graber (stgraber) wrote :

I'm quite surprised that with all of these tests I haven't got the mutex_lock bug again though, it was definitely happening on that machine... maybe some other fixes fixed it or I'm just not exercising the exact code path that's triggering it.

Clint Byrum (clint-fewbar) wrote :
Download full text (4.8 KiB)

On my somewhat lagged quantal, I have been seeing similar issues:

Linux clint-MacBookPro 3.5.0-8-generic #8-Ubuntu SMP Sat Aug 4 04:42:28 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

[194038.144050] unregister_netdevice: waiting for lo to become free. Usage count = 1
[194040.576173] INFO: task lxc-start:23872 blocked for more than 120 seconds.
[194040.576178] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[194040.576180] lxc-start D ffff88014fd13980 0 23872 1 0x00000000
[194040.576186] ffff880116909cc0 0000000000000086 ffff880090ad2e00 ffff880116909fd8
[194040.576192] ffff880116909fd8 ffff880116909fd8 ffff880144830000 ffff880090ad2e00
[194040.576197] ffff880116909cc0 ffffffff81ca91a0 ffff880090ad2e00 ffffffff81ca91a4
[194040.576202] Call Trace:
[194040.576212] [<ffffffff8167f519>] schedule+0x29/0x70
[194040.576217] [<ffffffff8167f7de>] schedule_preempt_disabled+0xe/0x10
[194040.576221] [<ffffffff8167e2f7>] __mutex_lock_slowpath+0xd7/0x150
[194040.576225] [<ffffffff8167ddca>] mutex_lock+0x2a/0x50
[194040.576230] [<ffffffff8156ab01>] copy_net_ns+0x71/0x100
[194040.576236] [<ffffffff8107b39b>] create_new_namespaces+0xdb/0x190
[194040.576239] [<ffffffff8107b58c>] copy_namespaces+0x8c/0xd0
[194040.576245] [<ffffffff81050112>] copy_process.part.22+0x902/0x1520
[194040.576249] [<ffffffff81050eb5>] do_fork+0x135/0x390
[194040.576254] [<ffffffff811820d5>] ? vfs_write+0x105/0x180
[194040.576258] [<ffffffff8101c2e8>] sys_clone+0x28/0x30
[194040.576263] [<ffffffff816889b3>] stub_clone+0x13/0x20
[194040.576267] [<ffffffff816886a9>] ? system_call_fastpath+0x16/0x1b
[194048.384149] unregister_netdevice: waiting for lo to become free. Usage count = 1
[194058.624071] unregister_netdevice: waiting for lo to become free. Usage count = 1
[194068.864079] unregister_netdevice: waiting for lo to become free. Usage count = 1
[194079.104158] unregister_netdevice: waiting for lo to become free. Usage count = 1
[194089.344152] unregister_netdevice: waiting for lo to become free. Usage count = 1
[194099.584105] unregister_netdevice: waiting for lo to become free. Usage count = 1
[194109.824044] unregister_netdevice: waiting for lo to become free. Usage count = 1
[194120.064158] unregister_netdevice: waiting for lo to become free. Usage count = 1
[194130.304148] unregister_netdevice: waiting for lo to become free. Usage count = 1
[194140.544146] unregister_netdevice: waiting for lo to become free. Usage count = 1
[194150.784065] unregister_netdevice: waiting for lo to become free. Usage count = 1
[194160.576246] INFO: task lxc-start:23872 blocked for more than 120 seconds.
[194160.576251] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[194160.576253] lxc-start D ffff88014fd13980 0 23872 1 0x00000000
[194160.576259] ffff880116909cc0 0000000000000086 ffff880090ad2e00 ffff880116909fd8
[194160.576265] ffff880116909fd8 ffff880116909fd8 ffff880144830000 ffff880090ad2e00
[194160.576270] ffff880116909cc0 ffffffff81ca91a0 ffff880090ad2e00 ffffffff81ca91a4
[194160.576275] Call Trace:
[194160.576286] [<ffffffff8167f519>] schedule+0x29/0x70
[194160.576290] [<ffffffff8167f7de>] sched...

Read more...

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Jean-Baptiste Lallement (jibel) wrote :

I can reproduce it very reliably on my system after shutting down an LXC container with poweroff from inside the container.
I'm setting to High because then the container cannot be started again without restarting the host system, and the host system won't shutdown waiting forever for lo to become free. Only SysRq helps in that case.

Changed in linux (Ubuntu):
importance: Medium → High
tags: added: rls-q-incoming
Stéphane Graber (stgraber) wrote :

Looking around for this bug, after getting it myself a few more times... I found http://lists.debian.org/debian-kernel/2012/05/msg00494.html which mentions a similar behaviour.

I extracted the C example and built it: http://paste.ubuntu.com/1182799/

Running it, indeed triggered the issue here, any subsequent call to lxc-start will just hang.
When running lxc-start under strace, I'm getting:
stat("/home/stgraber/data/vm/lxc/lib/precise-gui-i386/rootfs", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/home/stgraber/data/vm/lxc/lib/precise-gui-i386/rootfs.hold", O_RDWR|O_CREAT, 0600) = 17
clone(

So it looks like, whatever the issue is, it's triggering when trying to clone(CLONE_NEWNET).

Hope that helps point towards the right direction.

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Stéphane Graber (stgraber) wrote :

The last time I saw this happening was 5 minutes ago on a Lenovo x230 (no legacy BIOS), running:
Linux castiana 3.5.0-13-generic #14-Ubuntu SMP Wed Aug 29 16:48:44 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

As Jean-Baptiste says, this bug is extremely annoying as anyone using LXC and hitting this bug (that part seems quite random) won't be able to work until they power cycle the system, would appreciate if someone could actually look at this.

For some reason I've still never seen this.

Do you have a recipe by which, after a reboot, can you 100% reproduce
this?

The following seems pretty reliable to me:
 - gcc reproducer.c -o reproducer (using the paste.ubuntu.com code above)
 - sudo ./reproducer
 - ctrl+c
 - lxc-start -n <some-container>
 - dmesg | grep unregister

It appears that reproducing it that way is very reliable here, though the result is slightly different. Using this reproducer, the container will usually hang at startup for a few minutes, then eventually succeed to boot.
When getting the bug without that reproducer, it'd usually hang indefinitely (where indefinitely > 10 minutes).

tags: added: kernel-da-key kernel-key
Serge Hallyn (serge-hallyn) wrote :

I have finally been able to reproduce this, but it takes me much longer than it does Stephane.

Dan Kegel (dank) wrote :

I reproduced this on the first run of my lxc-ized buildbot setup script on a quantal host, so it's likely to hit real users.

Bryan Wu (cooloney) wrote :

I can reproduce this as Stephane's mentioned, but I only got message like "unregister_netdevice: waiting for lo to become free. Usage count = 2". There is no other oops messages like mutex_lock() and I think the oops is because lxc-start was blocked for too long.

So probably the subject of this bug should be changed.

Bryan Wu (cooloney) wrote :

After some testing, I think this is not a LXC specific issue. It's probably related to kernel CLONE_NEWNET code. Since if we run testing like this:
 - sudo ./reproducer
 - ctrl+c
  - sudo ./reproducer
wait for a while
 - dmesg | grep unregister

we can still get the same error message.

looks like the first try reproducer didn't release loopback device.

Stéphane Graber (stgraber) wrote :

I update the bug title to better match what we're seeing.
The oops from the description is indeed just a timeout from some user space task that's stuck on clone().

So it looks like there's something wrong either in the cleanup code when flushing a network namespace (when the last process in the namespace dies) or something wrong with the refcount.

summary: - stuck on mutex_lock creating a new network namespace when starting a
- container
+ clone() hang when creating new network namespace (dmesg show
+ unregister_netdevice: waiting for lo to become free. Usage count = 2)
Bryan Wu (cooloney) wrote :
Changed in linux (Ubuntu Quantal):
milestone: none → ubuntu-12.10-beta-2
tags: removed: rls-q-incoming
Bryan Wu (cooloney) wrote :

As Eric W. Biederman said in the bugzilla, 3.6-rc1 mainline version works. I've testing our Ubuntu mainline build like 3.6-rc1 and 3.6-rc5, which all work fine. But for 3.5.3 mainline build, this test failed. Our latest Quantal kernel is based on 3.5.3 kernel.

So obviously this issue was fixed during 3.6-rc1, I'm going to do some investigation and backport those fixing patches.

-Bryan

Bryan Wu (cooloney) wrote :

From Eric's reply, I found this issue is a little bit complex to backport some patches from 3.6-rc1, because from 3.5 to 3.6-rc1 some fundamental stuffs were changed.

--
As I recall the routing cache was removed between 3.5 and 3.6-rc1 so there are
some significant changes to the fundamentals.

What to look for failure of dev_hold and dev_put to pair.

The kernel configuration may play a role. I remember times when there was a
small bug in ipv6 multicast routing with respect to this. So a more minimal
configuration may not reproduce the problem.

I would also assume that the different reproducers exercise different code
paths
so you are probably dealing with more than one bug, between the ubuntu and the
debian bug trackers.

I hope those hints help.
--

But we won't use 3.6 kernel for our Quantal release.

Clint Byrum (clint-fewbar) wrote :

I run into this bug daily, it severely cripples the juju local provider on quantal.

Tim Gardner (timg-tpi) wrote :

Reassigning to Stefan.

Changed in linux (Ubuntu Quantal):
assignee: Bryan Wu (cooloney) → Stefan Bader (stefan-bader-canonical)
Stefan Bader (smb) wrote :

Repeating the comment in the upstream bug I just made:

I added debugging to dev_hold and dev_put as Eric suggested and used the reproducer attached to this bug. What I saw was creation and destruction would be balanced. However on the connect call, there were another two dev_hold() calls that seem to be exactly those references not been returned.

system_call_fastpath+0x1a/0x1f
  sys_connect+0xeb/0x110
    inet_stream_connect+0x11c/0x310
      tcp_v4_connect+0x13c/0x510
        ip_route_connect+???/???
          __ip_route_output_key+0x39a/0xb10
            ip_route_output_slow
              __mkroute_output
                rt_dst_alloc+0x3e/0x40
                  dst_alloc+0xc5/0x1c0
                    +1 = 8
                  rt_set_nexthop.isra.45+0x131/0x2d0 ?
                    rt_intern_hash+0x133/0x670
                      rt_bind_neighbour+0x1d/0x40
                        ipv4_neigh_lookup+0xe7/0x120
                          neigh_create+0x1bd/0x5d0
                            +9

Unfortunately the stack traces miss the details about going into ip_route_connect, but with more printks I know that ip_route_output_flow() is the one failing with -EINVAL.
Comparing functions between 3.5 and current linux-HEAD I was not very successful in spotting the important difference.

Stefan Bader (smb) wrote :

I guess I am at the limits of my knowledge. So far it looks like ip_connect_route initially calls __ip_route_output_key to fill in source and destination address. And this seems to cause a routing cache entry to be created with source and destination address 0.
It could be wrong to cache that or to create it with 0 addresses or both... I added all that info to the upstream bug in the hope of someone there knows details...

Stefan Bader (smb) wrote :

This is an experimental change that at least avoids the problem when running the test case. But while I saw no immediate problem, it isn't guaranteed to have no side effects and neither I can be sure there is not another case (only either source address or destination address not set) that would still suffer from the problem.
I added the same patch to the upstream bug report in the hope that this causes someone there to help with some information.

tags: added: patch
Stefan Bader (smb) wrote :

As testing with lxc containers showed, the clever idea does not work there because that is a case where there is at least one of the addresses (actually it seems both) is set. So it more and more looks like the real problem is that whenever the namespace is to be torn down, there is nothing enforcing to immediately evicting and releasing elements in the route cache that belong to the interfaces in that namespace.

One observation I made while fiddling around with this a bit more. Running the test program, then abort it with ctrl-c starts the messages about lo having a refcount of 2. Trying to start the same test will hang on the first listen. That would indicate that something that still hold some required lock or mutex is still running (the tear down has not finished). This will end after a longer time (I have not measured but process blocked is triggered at least once). And after that time the test programs connect will work again. That could mean two things:
1. Cleanup did finally succeed
2. Cleanup was aborted, we leak the bits in the route cache but at least new net namespaces are possible.

Stefan Bader (smb) wrote :

It seems that after about 5 minutes the references really get cleaned up without any change (so the question would be why that takes so long...

[ 44.099279] lo(ffff88003cc3c000)[0]+= 1
[ 44.099337] lo(ffff88003cc3c000)[0]+= 2
[ 44.099344] lo(ffff88003cc3c000)[0]+= 3
[ 44.099358] lo(ffff88003cc3c000)[0]+= 4
[ 44.099364] lo(ffff88003cc3c000)[0]+= 5
[ 44.099416] lo(ffff88003cc3c000)[0]+= 6
[ 44.099422] lo(ffff88003cc3c000)[0]+= 7
[ 44.099580] lo(ffff88003cc3c000)[0]+= 8
[ 44.099596] lo(ffff88003cc3c000)[0]+= 9
[ 46.728441] lo(ffff88003cc3c000)[1] -= -1
[ 46.728556] lo(ffff88003cc3c000)[1] -= -2
[ 46.728565] lo(ffff88003cc3c000)[1] -= -3
[ 46.729266] lo(ffff88003cc3c000)[1] -= -4
[ 46.729313] lo(ffff88003cc3c000)[1] -= -5
[ 46.729975] lo(ffff88003cc3c000)[1] -= -6
[ 46.732279] lo(ffff88003cc3c000)[1] -= -7
[ 338.896671] lo(ffff88003cc3c000)[0] -= 8
[ 338.896677] lo(ffff88003cc3c000)[0] -= 7

Stefan Bader (smb) wrote :

This second attempt goes the path of forcing the route cache to be cleaned of entries belonging to a net device that is torn down.

Stefan Bader (smb) wrote :

When looking at the patch uploaded I realized something went wrong on the update. Re-attaching.

Stefan Bader (smb) wrote :

Another step towards solving this. After it turned out in the upstream discussions that actually the route cache should be flushed when unregistering by doing the NETDEV_UNREGISTER_BATCH notify call, I could check to find out that this fails because the notify handler checks for dereferenced pointer that is not only not set but also not really necessary. Moving the handler around a bit should actually fix this.

Changed in linux:
importance: Unknown → High
status: Unknown → Confirmed
Clint Byrum (clint-fewbar) wrote :

Here is a syslog of the latest failure with the "#26+smb1" kernel showing refcount only stuck at 1, instead of 2

Changed in linux (Ubuntu Quantal):
milestone: ubuntu-12.10-beta-2 → ubuntu-12.10
Stefan Bader (smb) wrote :

Clint, could you please write down exactly which steps you do to reproduce the issue that leads to the single reference remaining? Which arguments to lxc-create/-start/-stop? Any special network setup?

I think that with the last patch I made we fixed one issue, though that (while easier to reproduce) is not the same as the case which leaves one reference and does not resolve itself over time. Now the other question is whether it will be possible to fix the other issue in time and on the other side whether adding the fix we already have on last minute, even though it does not completely fix things.

Clint Byrum (clint-fewbar) wrote :

Attaching syslog for latest fail.

BTW, the way I'm reproducing this is with a branch of juju that I've been working on:

You'll need this in ~/.juju/environments.yaml:

  local:
    type: local
    control-bucket: puppies-kittens-goblins
    admin-secret: abcdefghijklmnop0987654321
    data-dir: /tmp/juju-data
    default-series: precise
    juju-origin: lp:~clint-fewbar/juju/local-cloud-img

mkdir /tmp/juju-data
bzr branch lp:~clint-fewbar/juju/local-cloud-img
cd local-cloud-img
export PYTHONPATH=$PWD
export PATH=$PWD/bin:$PATH
juju bootstrap -e local
juju deploy wordpress -e local
# wait for the wordpress service to have 'agent-state: started'
watch juju status
juju destroy-environment -e local
# unregister_netdevice messages now start spitting out on dmesg

Clint Byrum (clint-fewbar) wrote :

The full environments.yaml, btw, should be

environments:
  local:
    type: local
    control-bucket: puppies-kittens-goblins
    admin-secret: abcdefghijklmnop0987654321
    data-dir: /tmp/juju-data
    default-series: precise
    juju-origin: lp:~clint-fewbar/juju/local-cloud-img

Stefan Bader (smb) wrote :

Certainly something is very wrong with the numbers. Looking at the numbers reported we get 5/13/-17/0 which sums up to 1 which is exactly the refcount complained about. But looking at the individual numbers, the counter for CPU#0 starts with 5 not 0.
And just adding up the dev_hold and dev_put calls I underrun down to -4. So it looks a bit like some magic suddenly warps the counter and we release more often than we appear to take the reference and still end up one too high.
It is a bit too late to think about it but maybe loopback gets assigned elements from another interface and that count is off by one.

Clint Byrum (clint-fewbar) wrote :

ok I can reproduce with just this shell script. The weird thing is, it only reproduces if I watch the log and kill the container after it fully boots.

cloud init will report that it is done booting like this:

cloud-init boot finished at Fri, 05 Oct 2012 20:15:32 +0000. Up 200.15 seconds

if you run the script, and wait until you see that message, then press enter, which triggers stop/destroy, the bug will reproduce on 3.6.0 upstream kernel [1] as well as quantal's

[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.6-quantal/

Clint Byrum (clint-fewbar) wrote :

Ok, I think I'm ready to say that there are really just two bugs. One is about the route cache, and is addressed by smb's most recent patch. The other one seems only to affect my macbookair. I have not been able to get the repro.sh script to reproduce the problem on any other machines with that kernel or a 3.6 kernel installed.

Clint Byrum (clint-fewbar) wrote :

Further information, this appears to be related to the wl proprietary drivers.

When I tried on my MacBookPro with wired network, the problem did not surface, but upon switching to wireless, the problem did surface.

This suggests that the real problem lies somewhere in the wl driver.

Tim Gardner (timg-tpi) on 2012-10-08
Changed in linux (Ubuntu Quantal):
status: Triaged → Fix Committed
Stefan Bader (smb) wrote :

Ok, it is probably obvious that we went ahead and applied at least the patch to the first half. Clint, probably we/you should open a second report for the remaining issue to have things cleanly separated.

tags: removed: kernel-key
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.5.0-17.28

---------------
linux (3.5.0-17.28) quantal-proposed; urgency=low

  [ Andy Whitcroft ]

  * [packaging] we already have a valid src_pkg_name
  * [packaging] allow us to select which builds have uefi signed versions

  [ James M Leddy ]

  * SAUCE: input: fix weird issue of synaptics psmouse sync lost after
    resume
    - LP: #717970

  [ Paolo Pisati ]

  * SAUCE: omap3 clocks .dev_id = NULL
    - LP: #1061599
  * [Config] omap: disable USB_[EHCI|OHCI]_HCD_PLATFORM
    - LP: #1061599
  * [Config] omap: enforce USB_[EHCI|OHCI]_HCD_PLATFORM=n
    - LP: #1061599

  [ Stefan Bader ]

  * SAUCE: net/ipv4: Always flush route cache on unregister batch call
    - LP: #1021471

  [ Upstream Kernel Changes ]

  * Bluetooth: Add USB_VENDOR_AND_INTERFACE_INFO() for Broadcom/Foxconn
    - LP: #1030233

  [ Wen-chien Jesse Sung ]

  * SAUCE: Bluetooth: Remove rules for matching Broadcom vendor specific
    IDs
    - LP: #1030233
 -- Leann Ogasawara <email address hidden> Tue, 09 Oct 2012 11:23:41 -0700

Changed in linux (Ubuntu Quantal):
status: Fix Committed → Fix Released
zoolook (nbensa) wrote :

I'm running 3.5.0-19-generic (3.5.0-19.30) and I still get this bug.

Am I missing something?

zoolook@venkman:~$ apt-cache policy linux-image-extra-3.5.0-19-generic
linux-image-extra-3.5.0-19-generic:
  Instalados: 3.5.0-19.30
  Candidato: 3.5.0-19.30
  Tabla de versión:
 *** 3.5.0-19.30 0
        500 http://archive.ubuntu.com/ubuntu/ quantal-proposed/main amd64 Packages
        100 /var/lib/dpkg/status

Stefan Bader (smb) wrote :

I guess, yes: the other bug causing refcount leaks: bug #1065434.

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Chris J Arges (arges) on 2014-05-01
Changed in linux (Ubuntu Precise):
assignee: nobody → Chris J Arges (arges)
importance: Undecided → Medium
status: New → In Progress
Chris J Arges (arges) on 2014-05-02
description: updated
description: updated
Tim Gardner (timg-tpi) on 2014-05-02
Changed in linux (Ubuntu Precise):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-precise' to 'verification-done-precise'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-precise
Chris J Arges (arges) wrote :

Verified on 3.2.0-63-virtual.

tags: added: verification-done-precise
removed: verification-needed-precise
Launchpad Janitor (janitor) wrote :
Download full text (17.7 KiB)

This bug was fixed in the package linux - 3.2.0-63.95

---------------
linux (3.2.0-63.95) precise; urgency=low

  [ Kamal Mostafa ]

  * Revert "rtlwifi: Set the link state"
    - LP: #1319735
  * Release Tracking Bug
    - re-used previous tracking bug

linux (3.2.0-63.94) precise; urgency=low

  [ Kamal Mostafa ]

  * Merged back Ubuntu-3.2.0-61.93 security release
  * Revert "n_tty: Fix n_tty_write crash when echoing in raw mode"
    - LP: #1314762
  * Release Tracking Bug
    - LP: #1316703

  [ Stefan Bader ]

  * SAUCE: net/ipv4: Always flush route cache on unregister batch call
    - LP: #1021471

  [ Upstream Kernel Changes ]

  * ipv6: don't set DST_NOCOUNT for remotely added routes
    - LP: #1293726
    - CVE-2014-2309
  * vhost: fix total length when packets are too short
    - LP: #1312984
    - CVE-2014-0077
  * n_tty: Fix n_tty_write crash when echoing in raw mode
    - LP: #1314762
    - CVE-2014-0196
  * floppy: ignore kernel-only members in FDRAWCMD ioctl input
    - LP: #1316729
    - CVE-2014-1737
  * floppy: don't write kernel-only members to FDRAWCMD ioctl output
    - LP: #1316735
    - CVE-2014-1738

linux (3.2.0-62.93) precise; urgency=low

  [ Joseph Salisbury ]

  * Release Tracking Bug
    - LP: #1313807

  [ Joseph Salisbury ]

  * [Config] updateconfigs after Linux v3.2.57 update

  [ Upstream Kernel Changes ]

  * rds: prevent dereference of a NULL device in rds_iw_laddr_check
    - LP: #1302222
    - CVE-2014-2678
  * rtlwifi: Set the link state
    - LP: #1310763
  * rtlwifi: rtl8192cu: Fix some code in RF handling
    - LP: #1310763
  * NFSv4: OPEN must handle the NFS4ERR_IO return code correctly
    - LP: #1310763
  * selinux: process labeled IPsec TCP SYN-ACK packets properly in
    selinux_ip_postroute()
    - LP: #1310763
  * parport: parport_pc: remove double PCI ID for NetMos
    - LP: #1310763
  * staging: vt6656: [BUG] BBvUpdatePreEDThreshold Always set sensitivity
    on bScanning
    - LP: #1310763
  * bfa: Chinook quad port 16G FC HBA claim issue
    - LP: #1310763
  * usb: option: add new zte 3g modem pids to option driver
    - LP: #1310763
  * dib8000: make 32 bits read atomic
    - LP: #1310763
  * serial: add support for 400 and 800 v3 series Titan cards
    - LP: #1310763
  * serial: add support for 200 v3 series Titan card
    - LP: #1310763
  * x86/efi: Fix off-by-one bug in EFI Boot Services reservation
    - LP: #1310763
  * rtc-cmos: Add an alarm disable quirk
    - LP: #1310763
  * slub: Fix calculation of cpu slabs
    - LP: #1310763
  * mtd: mxc_nand: remove duplicated ecc_stats counting
    - LP: #1310763
  * USB: pl2303: fix data corruption on termios updates
    - LP: #1310763
  * USB: serial: add support for iBall 3.5G connect usb modem
    - LP: #1310763
  * USB: Nokia 502 is an unusual device
    - LP: #1310763
  * USB: cypress_m8: fix ring-indicator detection and reporting
    - LP: #1310763
  * ALSA: rme9652: fix a missing comma in channel_map_9636_ds[]
    - LP: #1310763
  * sunrpc: Fix infinite loop in RPC state machine
    - LP: #1310763
  * SELinux: Fix memory leak upon loading policy
    - LP: #1310763
  * drm/radeon: warn users when hw_i2c is enabled (v2)
    - LP: #131...

Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released

It looks I have this problem with Ubuntu Raring
kernel version : 3.8.13

I attached a screenshot with dmesg after I tried to start again my containers.

Should I open a new bug ?

Thank you in advance

Download full text (6.0 KiB)

Raring is EOL since January 27:

https://wiki.ubuntu.com/Releases

Excerpts from Alessandro Moscatelli's message of 2014-06-12 22:23:52 UTC:
> It looks I have this problem with Ubuntu Raring
> kernel version : 3.8.13
>
> I attached a screenshot with dmesg after I tried to start again my
> containers.
>
> Should I open a new bug ?
>
> Thank you in advance
>
>
> ** Attachment added: "screenshot"
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1021471/+attachment/4130637/+files/5.png
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1021471
>
> Title:
> clone() hang when creating new network namespace (dmesg show
> unregister_netdevice: waiting for lo to become free. Usage count = 2)
>
> Status in The Linux Kernel:
> Confirmed
> Status in “linux” package in Ubuntu:
> Fix Released
> Status in “linux” source package in Precise:
> Fix Released
> Status in “linux” source package in Quantal:
> Fix Released
>
> Bug description:
> SRU Justification:
>
> Impact:
> When creating new network namespace dmesg can show the following
> unregister_netdevice: waiting for lo to become free. Usage count = 1
>
> Fix:
> Stefan Bader's SAUCE patch has fixed this for Quantal:
> UBUNTU: SAUCE: net/ipv4: Always flush route cache on unregister batch call
>
> Testcase:
> The sourcecode found here:
> https://lists.debian.org/debian-kernel/2012/05/msg00494.html
> can be compiled and run as follows:
>
> sudo ./reproducer
> #ctrl+c
> sudo ./reproducer
> #wait for a while
> dmesg | grep unregister
>
>
> --
>
> I'm not sure how I triggered this. I've been moving around between
> networks and suspending/resuming all day.
>
> Earlier in this boot I successfully used a container (start,
> networking and stop). I came to start the same one later and noticed
> that it didn't come up. Trying to attach to the console with lxc-
> console informed me that it wasn't running. I then saw suspicious
> content in dmesg:
>
> [25800.412234] INFO: task lxc-start:25817 blocked for more than 120 seconds.
> [25800.412243] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [25800.412254] lxc-start D ffff88013fd13980 0 25817 1 0x00000000
> [25800.412266] ffff880007b43cc0 0000000000000086 ffff88003ba4c500 ffff880007b43fd8
> [25800.412275] ffff880007b43fd8 ffff880007b43fd8 ffff880134c65c00 ffff88003ba4c500
> [25800.412284] 000080d0ffffffff ffffffff81ca7c00 ffff88003ba4c500 ffffffff81ca7c04
> [25800.412288] Call Trace:
> [25800.412306] [<ffffffff81673759>] schedule+0x29/0x70
> [25800.412313] [<ffffffff81673a1e>] schedule_preempt_disabled+0xe/0x10
> [25800.412323] [<ffffffff81672537>] __mutex_lock_slowpath+0xd7/0x150
> [25800.412331] [<ffffffff8167200a>] mutex_lock+0x2a/0x50
> [25800.412340] [<ffffffff8155ede1>] copy_net_ns+0x71/0x100
> [25800.412350] [<ffffffff8107adfb>] create_new_namespaces+0xdb/0x190
> [25800.412357] [<ffffffff8107afec>] copy_namespaces+0x8c/0xd0
> [25800.412367] [<ffffffff81050142>] copy_pr...

Read more...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.