unregister_netdevice: waiting for lo to become free. Usage count = 2' is reported and causing kernel hang when floodlight tests are run using utah

Bug #1181315 reported by Para Siva
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury
Saucy
Fix Released
Medium
Joseph Salisbury

Bug Description

When utah client is used to run floodlight smoke tests for saucy server images, the following error is reported.

May 17 17:41:13 ubuntu kernel: [ 618.483362] IPv6: ADDRCONF(NETDEV_UP): s1-eth1: link is not ready
May 17 17:41:23 ubuntu kernel: [ 628.220062] unregister_netdevice: waiting for lo to become free. Usage count = 2
May 17 17:41:33 ubuntu kernel: [ 638.460029] unregister_netdevice: waiting for lo to become free. Usage count = 2
....
...
...
May 17 17:44:16 ubuntu kernel: [ 801.760026] unregister_netdevice: waiting for lo to become free. Usage count = 2
May 17 17:44:27 ubuntu kernel: [ 812.000029] unregister_netdevice: waiting for lo to become free. Usage count = 2
May 17 17:44:37 ubuntu kernel: [ 822.240031] unregister_netdevice: waiting for lo to become free. Usage count = 2
May 17 17:44:47 ubuntu kernel: [ 832.480024] unregister_netdevice: waiting for lo to become free. Usage count = 2
May 17 17:44:55 ubuntu kernel: [ 840.564053] INFO: task mnexec:9343 blocked for more than 120 seconds.
May 17 17:44:55 ubuntu kernel: [ 840.564268] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 17 17:44:55 ubuntu kernel: [ 840.564494] mnexec D ffff88003fc14240 0 9343 8358 0x00000000
May 17 17:44:55 ubuntu kernel: [ 840.564498] ffff880036c3fe20 0000000000000046 ffff880036c3ffd8 0000000000014240
May 17 17:44:55 ubuntu kernel: [ 840.564501] ffff880036c3ffd8 0000000000014240 ffff88003c1add40 ffffffff81ccea60
May 17 17:44:55 ubuntu kernel: [ 840.564503] ffff88003c1add40 ffffffff81ccea64 00000000ffffffff ffffffff81ccea68
May 17 17:44:55 ubuntu kernel: [ 840.564506] Call Trace:
May 17 17:44:55 ubuntu kernel: [ 840.564531] [<ffffffff816d1499>] schedule_preempt_disabled+0x29/0x70
May 17 17:44:55 ubuntu kernel: [ 840.564535] [<ffffffff816cf537>] __mutex_lock_slowpath+0xe7/0x160
May 17 17:44:55 ubuntu kernel: [ 840.564550] [<ffffffff8117cbcf>] ? __kmalloc+0x14f/0x180
May 17 17:44:55 ubuntu kernel: [ 840.564554] [<ffffffff816cefbf>] mutex_lock+0x1f/0x30
May 17 17:44:55 ubuntu kernel: [ 840.564564] [<ffffffff815c5e05>] copy_net_ns+0x65/0x100
May 17 17:44:55 ubuntu kernel: [ 840.564574] [<ffffffff810814e9>] create_new_namespaces+0xf9/0x180
May 17 17:44:55 ubuntu kernel: [ 840.564577] [<ffffffff81081731>] unshare_nsproxy_namespaces+0x61/0xa0
May 17 17:44:55 ubuntu kernel: [ 840.564582] [<ffffffff81057b41>] sys_unshare+0x181/0x2a0
May 17 17:44:55 ubuntu kernel: [ 840.564585] [<ffffffff816da99d>] system_call_fastpath+0x1a/0x1f

================================================
This is observed to occur only when utah is used to run the floodlight smoke tests. Running the floodlight tests without utah does not cause this issue and using utah to run some other tests does not cause this issue either.

Noticed a similar bug 1021471, but this one is occurring in KVM using libvirt.

The issue is only with the saucy kernel versions, 3.9.0-1 onwards and does not occur with 3.9.0-0. The tests were tried with saucy server VMs on precise as well as saucy hosts and the results is the same.

================
Steps to reproduce:
1. Install a default saucy server on a vm (either manually or preseeded default installation) with either i386 or amd64 image using KVM, libvirt
2. Do the following to install utah inside the VM
   sudo apt-add-repository -y ppa:utah/stable
   sudo apt-get update
   sudo apt-get install utah
3. Run the floodlight tests using
     sudo utah -r lp:ubuntu-test-cases/server/runlists/floodlight.run
4. Now "kernel: [ 358.452029] unregister_netdevice: waiting for lo to become free. Usage count = 2" will be continuously reported

The impacted jobs are,
https://jenkins.qa.ubuntu.com/view/Saucy/view/Smoke%20Testing/job/saucy-server-amd64-smoke-floodlight/15/
and
https://jenkins.qa.ubuntu.com/view/Saucy/view/Smoke%20Testing/job/saucy-server-i386-smoke-floodlight/13/

ProblemType: Bug
DistroRelease: Ubuntu 13.10
Package: linux-image-3.9.0-2-generic 3.9.0-2.6
ProcVersionSignature: Ubuntu 3.9.0-2.6-generic 3.9.2
Uname: Linux 3.9.0-2-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version k3.9.0-2-generic.
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.10.1-0ubuntu1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
Date: Fri May 17 17:53:02 2013
HibernationDevice: RESUME=UUID=a9b2a509-9179-44d3-a4ea-807f7cd05d4e
InstallationDate: Installed on 2013-05-17 (0 days ago)
InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Alpha amd64 (20130517)
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: Bochs Bochs
MarkForUpload: True
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.9.0-2-generic root=UUID=810403bd-1b5c-4a51-bc07-6ce34be38cd9 nomodeset ro
RelatedPackageVersions:
 linux-restricted-modules-3.9.0-2-generic N/A
 linux-backports-modules-3.9.0-2-generic N/A
 linux-firmware 1.108
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 01/01/2011
dmi.bios.vendor: Bochs
dmi.bios.version: Bochs
dmi.chassis.type: 1
dmi.chassis.vendor: Bochs
dmi.modalias: dmi:bvnBochs:bvrBochs:bd01/01/2011:svnBochs:pnBochs:pvr:cvnBochs:ct1:cvr:
dmi.product.name: Bochs
dmi.sys.vendor: Bochs

Revision history for this message
Para Siva (psivaa) wrote :
description: updated
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Para Siva (psivaa)
tags: added: iso-testing qa-daily-testing rls-r-incoming
tags: added: rls-s-incoming
removed: rls-r-incoming
Para Siva (psivaa)
description: updated
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.10 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.10-rc1-saucy/

Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

If the bug still exists in v3.10-rc1, I can perform a kernel bisect to identify the commit that introduced this regression in 3.9.0-1

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: regression-update
Revision history for this message
Para Siva (psivaa) wrote :

The bug still exists in the upstream version, 3.10.0-031000rc1-generic, installed from above. Please see attached for the corresponding dmesg. Thanks

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: performing-bisect
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a bisect to figure out what commit caused this regression. We need to identify the earliest kernel where the issue started happening as well as the latest kernel that did not have this issue.

Can you test the following kernels and report back? We are looking for the first kernel version that exhibits this bug:

v3.8 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-raring/
v3.9-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc1-saucy

I assume 3.8 will not have the bug and v3.9-rc1 will.

Thanks in advance!

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Para Siva (psivaa) wrote :

The bug is not present in the kernel, v3.8 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-raring/

Now, for v3.9-rc1, http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc1-saucy link is not found, so I tested with
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9.1-saucy/ and the bug is present in this kernel version.

logs from both the versions are attached

Revision history for this message
Para Siva (psivaa) wrote :

logs with raring kernel where the bug is not present is attached

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'll start a kernel bisect now. Can you also test v3.10-rc4 to see if this issue still exists in Mainline:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.10-rc4-saucy/

tags: added: kernel-key
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'm also going to build a test kernel with the following commits reverted:

d2ed27 net: disallow drivers with buggy VLAN accel to register_netdevice()
948b33 net: init perm_addr in register_netdevice()

Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
status: Confirmed → In Progress
Revision history for this message
Para Siva (psivaa) wrote :

The issue is still present in v3.10-rc4 as well.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with the following commits reverted:
d2ed27 net: disallow drivers with buggy VLAN accel to register_netdevice()
948b33 net: init perm_addr in register_netdevice()

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1181315/

Please install both these .debs:
 linux-image-3.9.0-3-generic_3.9.0-3.8~lp1181315v1_amd64.deb
 linux-image-extra-3.9.0-3-generic_3.9.0-3.8~lp1181315v1_amd64.deb

It would be great if you can test this kernel and report back if it exhibits the bug or not. If the bug still exists with this kernel, I'll continue a bisect.

Thanks in advance.

Revision history for this message
Para Siva (psivaa) wrote :

The bug still exists in the test kernel. (3.9.0-3-generic #8~lp1181315v1)

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between v3.8 final and v3.9-rc1. The kernel bisect will require testing of about 7-10 test kernels.

I built the first test kernel, up to the following commit:
b274776c54c320763bc12eb035c0e244f76ccb43

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1181315

Can you test that kernel and report back if it has the bug or not. I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Para Siva (psivaa) wrote :

 Sorry for the late response, was afk.
The issue is not occurring with this kernel, 3.8.0-030800-generic.
Additionally, the issue does not occur with the latest kernel version in the saucy server images too. (3.9.0-5-generic).
Thanks for your time on this.

Revision history for this message
Para Siva (psivaa) wrote :

This bug is appearing again with 3.9.0-6-generic #13 on an amd64 installation of 20130616.

Revision history for this message
Para Siva (psivaa) wrote :
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you confirm the issue does not exist in 3.9.0-5-generic? If that is the case, it will be faster to bisect between 3.9.0-5-generic and 3.9.0-6-generic

Also, v3.10-rc6[0] is now available. Saucy will be rebased to v3.10 so, it would be good to know if the bug still exists in mainline.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.10-rc6-saucy/

Revision history for this message
Para Siva (psivaa) wrote :

On i386 images, the bug is not occurring since 3.9.0-5. i.e. the issue does not occur with kernel version 3.9.0-5 and 3.9.0-6 kernels on i386 images.

On amd64 images, although with 3.9.0-5 there were a couple of instances when the issue did not occur during smoke tests, the consecutive runs show that the bug is in fact still present even with 3.9.0-5 and 3.9.0-6. Sorry for my hasty claim on comment #15.

So basically on i386 images the bug is not present from 3.9.0-5 onwards but on amd64 images it continues to exist even with kernels 3.9.0-5 and 3.9.0-6

With mainline v3.10.0-031000rc6-generic the bug is still present on amd64 images. (but displays different message in dmesg as given below)

unregister_netdevice: waiting for h1-eth0 to become free. Usage count = 1
unregister_netdevice: waiting for h2-eth0 to become free. Usage count = 1

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you test the following kernels, since this issue only happens with 3.9.0-1 and onwards:

v3.9-rc8: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc8-raring/
v3.9: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-raring/
v3.9.1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9.1-saucy/

You don't have to test every kernel, just up until the kernel that first has this bug. We need to identify the first kernel that has the bug and the last kernel that did not.

Revision history for this message
Para Siva (psivaa) wrote :

The bug is present in all the above three versions, v3.9-rc8, v3.9 and v3.9.1 from the mainline. I also tested with v3.8.13.2-raring from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8.13.2-raring/ and the bug is NOT present there. Hope this helps.

Please feel free to ask any version you want tested. This is making a couple of our daily smoke tests failing, so helping you fix this would in fact help us. Thanks

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for testing, Parameswaran. I don't see in previous comments that we confirmed the bug existed in v3.9-rc1. The link was missing at one point, but it is now available:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc1-raring/

Can you test v3.9-rc1, to confirm the bisect I started in comment #14 is the right good and bad commits? If v3.9-rc1 does not have the bug, we will want to test some of the newer release candidates until we find the first version that exhibits the bug.

Revision history for this message
Para Siva (psivaa) wrote :

Tested the kernels from v3.9-rc1 up until v3.9-rc7. The bug is not present up until v3.9-rc5-raring and started appearing on v3.9-rc6-raring.

v3.9-rc1-raring - No bug
v3.9-rc2-raring - No bug
v3.9-rc3-raring - No bug
v3.9-rc4-raring - No bug
v3.9-rc5-raring - No bug
v3.9-rc6-raring - Bug exists
v3.9-rc7-raring - Bug exists

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This is great info. Thanks, Parameswaran. I'll start a bisect between v3.9-rc5 and v3.9-rc6 and post a test kernel shortly.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between v3.9-rc5 final and v3.9-rc6. The kernel bisect will require testing of about 7 test kernels.

I built the first test kernel, up to the following commit:
17eb3d8fbe4c573426fc99946040305e79c07803

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1181315

Can you test that kernel and report back if it has the bug or not. I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Para Siva (psivaa) wrote :

This test kernel contains the bug. Thanks.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
fefcdbe4accadc7b9fac67066762d11f0e36d173

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1181315

Can you test that kernel and report back if it has the bug or not. I will build the next test kernel based on your test results.

Thanks in advance

tags: removed: performing-bisect
Revision history for this message
Para Siva (psivaa) wrote :

The test kernel given in #27 does not exhibit the bug. Thanks.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
1caa590075ddef41950c46123e80cd6a64505218

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1181315

Can you test that kernel and report back if it has the bug or not. I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Para Siva (psivaa) wrote :

This test kernel also does not exhibit the bug. thanks

tags: removed: kernel-key
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
bd709bd027a394bce911a0cd60ee9cbdde49361b

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1181315

Can you test that kernel and report back if it has the bug or not. I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Para Siva (psivaa) wrote :

Utah bug 1194533 is blocking the testing of this bug. Have been trying to bypass this utah bug to test this kernel with not much of success. Will update the result once I am able to do it. Thanks

Revision history for this message
Para Siva (psivaa) wrote :

The utah bug that was blocking the testing of this test kernel is now fixed, but I am unable to reproduce this issue with the new images containing 3.10.0-2 kernel. I could watch for this bug a couple of more runs with new images and give the update here.

Revision history for this message
James Page (james-page) wrote :

This bug looks alot like bug 1197078

Revision history for this message
Para Siva (psivaa) wrote :

This has not occurred since 3.10.0.2. The jobs have not run much after this kernel due to some other bug 1197484. Still watching.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It's been a while since this bug was updated. I'll mark it as incomplete for now. If the bug is fixed, please change the status to "Fix Released". If the bug still exists, please change the status to "Confirmed".

Changed in linux (Ubuntu Saucy):
status: In Progress → Incomplete
Revision history for this message
Para Siva (psivaa) wrote :

This bug does not exist any more. Sorry for the delayed response. I'll mark it as Fix Released. Thanks for the time on this.

Changed in linux (Ubuntu Saucy):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.