"unregister_netdevice: waiting for lo to become free. Usage count = 1" after LXC container shutdown

Bug #1065434 reported by Jean-Baptiste Lallement
90
This bug affects 19 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Colin Ian King
Quantal
Fix Released
Undecided
Colin Ian King
Raring
Fix Released
High
Colin Ian King

Bug Description

Reproduced with 3.5.0-17-generic #28-Ubuntu x86_64 on 2 different systems

TEST CASE:
On a fresh installation of Quantal with all updates applied run:

On the host
1. sudo lxc-create -n test01 -t ubuntu -- -a amd64 -r quantal
2. sudo lxc-start -n test01
3. Login with ubuntu/ubuntu

Inside the container
4. Copy an amount of data from the network (scp from your local network or wget below)
  4.1 sudo apt-get update && sudo apt-get -y install wget
  4.2 wget cdimage.ubuntu.com/daily-live/current/quantal-desktop-amd64.iso
5. Download ~50MB and press CTRL+C
6. sudo poweroff

ACTUAL RESULT:
The following message is logged "unregister_netdevice: waiting for lo to become free. Usage count = 1" and next lxc-start will hang forever.

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: linux-image-3.5.0-17-generic 3.5.0-17.28
ProcVersionSignature: Ubuntu 3.5.0-17.28-generic 3.5.5
Uname: Linux 3.5.0-17-generic x86_64
ApportVersion: 2.6.1-0ubuntu3
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: j-lallement 2009 F.... pulseaudio
CurrentDmesg: [ 8.478430] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Date: Thu Oct 11 11:20:52 2012
HibernationDevice: RESUME=UUID=b065a16a-8547-4515-aa2a-fd2c95c76c9a
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Beta amd64 (20121009)
MachineType: ASUSTeK Computer Inc. U3SG
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=fr_FR.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.5.0-17-generic root=UUID=fbe53095-038a-4664-bfec-990f8a0656cd ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-17-generic N/A
 linux-backports-modules-3.5.0-17-generic N/A
 linux-firmware 1.94
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 01/28/2008
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 305
dmi.board.asset.tag: ATN12345678901234567
dmi.board.name: U3SG
dmi.board.vendor: ASUSTeK Computer Inc.
dmi.board.version: 1.0
dmi.chassis.asset.tag: ATN12345678901234567
dmi.chassis.type: 10
dmi.chassis.vendor: ASUSTeK Computer Inc.
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr305:bd01/28/2008:svnASUSTeKComputerInc.:pnU3SG:pvr1.0:rvnASUSTeKComputerInc.:rnU3SG:rvr1.0:cvnASUSTeKComputerInc.:ct10:cvr1.0:
dmi.product.name: U3SG
dmi.product.version: 1.0
dmi.sys.vendor: ASUSTeK Computer Inc.

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Logs with kernel 3.5.0-17-generic-#26+smb2

Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :
Stefan Bader (smb)
Changed in linux (Ubuntu):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
importance: Undecided → Medium
description: updated
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

I did the following additional tests:
1) Download 1MB at 5kbps: PASS
2) Download 50MB at 100Mpbs: FAIL

description: updated
description: updated
description: updated
Revision history for this message
Piotr Karbowski (piotr-karbowski) wrote :

Hi, I do hit the exact same issue on my systems, its not ubuntu, but for what's it worth I did tested the vanilla 3.5 and 3.6.3 kernel and the bug is present there, maybe that can a bit help when you know that none of the ubuntu's kernel patches have anything to do with the issue. I can reproduce it on normal hardware and in KVM virtual machine with e1000 card. Whats is common between all my systems is that lxc uses VETH driver and connect to bridge. For now I have to downgrade all my LXC hosts to 3.4.x.

Changed in linux (Ubuntu):
importance: Medium → High
Revision history for this message
Paul Czarkowski (paulcz) wrote :

Sounds like I'm in the same boat. Exact same symptoms as above, Running Ubuntu 12.10 fully updated. Making it very difficult to do anything productive with juju / lxc containers.

Changed in linux (Ubuntu):
assignee: Stefan Bader (stefan-bader-canonical) → Colin King (colin-king)
information type: Public → Public Security
information type: Public Security → Public
Revision history for this message
Colin Ian King (colin-king) wrote :

@Jean-Baptiste,

I think I've found the offending bug and upstream commit 3d861f661006606bf159fd6bd973e83dbf21d0f9 "net: fix secpath kmemleak" fixes a reference counting issue that stops lo from becoming free.

I've build some test kernels, perhaps you could try the appropriate one and let me know if it resolves the bug for you. You can download them from: http://kernel.ubuntu.com/~cking/lp-1065434/

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

I installed kernel 3.5.0-18.29-lp1065434 and did the following tests:
- Ran the test case in the description
- Copied bigger files (up to 1.4G)
- Copied data from the host to the container and the container to the host
- Copied data from a network host to the container and the container to a network host
- Copied data over a wired and wireless link
- Shutdown/restarted the container several times from the container with poweroff and from the host lxc-stop

I couldn't reproduce the issue while it's reliably reproducible with the version of the kernel in 12.10.

It sounds like the patched kernel resolves this issue. Thanks!

Revision history for this message
Colin Ian King (colin-king) wrote :

== SRU Justification ==

Copying a fairly large amount of data over the network inside a container
and then exiting the container can trigger a missing decrement in the per
cpu reference count on a network device. The result is the kernel error
message:

unregister_netdevice: waiting for lo to become free. Usage count = 1

..and the next time lxc-start is started it just hangs.

== Fix ==

The following upstream commit 3d861f661006606bf159fd6bd973e8 fixes
this issue by releasing all possible references to linked objects.

== Impact ==

Users copying large amounts of data over the network inside a
container can trigger this bug and cannot exit the container and
restart it.

== Test Case ==

To trigger the bug:

On the host:
1. sudo lxc-create -n test01 -t ubuntu -- -a amd64 -r quantal
2. sudo lxc-start -n test01
3. Login with ubuntu/ubuntu

Inside the container:

4. Copy an amount of data from the network (scp from your local network or wget below)
  4.1 sudo apt-get update && sudo apt-get -y install wget
  4.2 wget cdimage.ubuntu.com/daily-live/current/quantal-desktop-amd64.iso
5. Download more than 250MB and press CTRL+C
6. sudo poweroff

On the host:
After a few seconds one will see the kernel warning appear:

unregister_netdevice: waiting for lo to become free. Usage count = 1

7. sudo lxc-start -n test01

..and this will hang.

With the fix, the error message will not occur and step 7 will not hang. One
can run lxc-start and then exit the container multiple times without it hanging

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Quantal):
status: New → Fix Committed
Changed in linux (Ubuntu Raring):
status: Incomplete → Fix Released
Changed in linux (Ubuntu Quantal):
assignee: nobody → Colin King (colin-king)
Revision history for this message
Louis-Dominique Dubeau (ldd) wrote :

I see 3.5.0-19.30 has been released for Quantal but grepping through the changelog and the entire source tree indicates that the patch has not been applied to that kernel. Am I incorrect?

Revision history for this message
Colin Ian King (colin-king) wrote :

The fix has been committed to http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-quantal.git;a=commit;h=6661571b3379ddb6587eb07b65dda75da7d008f5 and will appear in the next update. The fix just missed the previous update cycle, hence the delay.

Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel for Quantal in -proposed solves the problem (3.5.0-20.31). Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-quantal' to 'verification-done-quantal'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-quantal
Norberto Bensa (nbensa)
tags: added: verification-done-quantal
removed: verification-needed-quantal
Revision history for this message
Colin Ian King (colin-king) wrote :

Also verified on Precise with the LTS quantal kernel: “linux-lts-quantal” 3.5.0-20.31~precise1

Revision history for this message
Colin Ian King (colin-king) wrote :

Verified with Linux ubuntu 3.5.0-20-generic #31-Ubuntu

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

I verified on Quantal with 3.5.0-20-generic #31-Ubuntu

Revision history for this message
Adam Conrad (adconrad) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (24.2 KiB)

This bug was fixed in the package linux - 3.5.0-21.32

---------------
linux (3.5.0-21.32) quantal-proposed; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1088979
  * SAUCE: i915_hsw: move i915_hsw_enabled symbol to intel_ips
    - LP: #1087622

linux (3.5.0-20.31) quantal-proposed; urgency=low

  [Luis Henriques]

  * Release Tracking Bug
    - LP: #1086759

  [ Ben Widawsky ]

  * SAUCE: i915_hsw: Include #define I915_PARAM_HAS_WAIT_TIMEOUT
    - LP: #1085245
  * SAUCE: i915_hsw: Include #define DRM_I915_GEM_CONTEXT_[CREATE,DESTROY]
    - LP: #1085245
  * SAUCE: i915_hsw: drm/i915: add register read IOCTL
    - LP: #1085245
  * SAUCE: i915_hsw: Include #define i915_execbuffer2_[set,get]_context_id
    - LP: #1085245

  [ Chris Wilson ]

  * SAUCE: i915_hsw: Include #define I915_GEM_PARAM_HAS_SEMAPHORES
    - LP: #1085245
  * SAUCE: i915_hsw: Include #define I915_PARAM_HAS_SECURE_BATCHES
    - LP: #1085245

  [ Daniel Vetter ]

  * SAUCE: i915_hsw: drm/i915: call intel_enable_gtt
    - LP: #1085245
  * SAUCE: i915_hsw: drm: add helper to sort panels to the head of the
    connector list
    - LP: #1085245
  * SAUCE: i915_hsw: drm: extract dp link bw helpers
    - LP: #1085245
  * SAUCE: i915_hsw: drm: extract drm_dp_max_lane_count helper
    - LP: #1085245
  * SAUCE: i915_hsw: drm: dp helper: extract drm_dp_channel_eq_ok
    - LP: #1085245
  * SAUCE: i915_hsw: drm: extract helpers to compute new training values
    from sink request
    - LP: #1085245
  * SAUCE: i915_hsw: drm: dp helper: extract drm_dp_clock_recovery_ok
    - LP: #1085245

  [ Dave Airlie ]

  * SAUCE: i915_hsw: Include #define I915_PARAM_HAS_PRIME_VMAP_FLUSH
    - LP: #1085245

  [ Leann Ogasawara ]

  * SAUCE: i915_hsw: Provide an ubuntu/i915 driver for Haswell graphics
    - LP: #1085245
  * SAUCE: i915_hsw: Revert "drm: Make the .mode_fixup() operations mode
    argument a const pointer" for ubuntu/i915 driver
    - LP: #1085245
  * SAUCE: i915_hsw: Rename ubuntu/i915 driver i915_hsw
    - LP: #1085245
  * SAUCE: i915_hsw: Only support Haswell with ubuntu/i915 driver
    - LP: #1085245
  * SAUCE: i915_hsw: Include #define DRM_I915_GEM_WAIT
    - LP: #1085245
  * SAUCE: i915_hsw: drm: extract dp link train delay functions from radeon
    - LP: #1085245
  * SAUCE: i915_hsw: drm/dp: Update DPCD defines
    - LP: #1085245
  * SAUCE: i915_hsw: Update intel_ips.h file location
    - LP: #1085245
  * SAUCE: i915_hsw: Provide updated drm_mm.h and drm_mm.c for ubuntu/i915
    - LP: #1085245
  * SAUCE: i915_hsw: drm/i915: Replace the array of pages with a
    scatterlist
    - LP: #1085245
  * SAUCE: i915_hsw: drm/i915: Replace the array of pages with a
    scatterlist
    - LP: #1085245
  * SAUCE: i915_hsw: drm/i915: Stop using AGP layer for GEN6+
    - LP: #1085245
  * SAUCE: i915_hsw: Add i915_hsw_gpu_*() calls for ubuntu/i915
    - LP: #1085245
  * i915_hsw: [Config] Enable CONFIG_DRM_I915_HSW=m
    - LP: #1085245

  [ Paulo Zanoni ]

  * SAUCE: drm/i915: fix hsw_fdi_link_train "retry" code
    - LP: #1085245
  * SAUCE: drm/i915: reject modes the LPT FDI receiver can't handle
    - LP: #1085245
  * SAUCE: drm/i915: add support for mPHY destination on i...

Changed in linux (Ubuntu Quantal):
status: Fix Committed → Fix Released
Revision history for this message
AnrDaemon (anrdaemon) wrote :

The bug is not fixed as of today with 3.13.0-77 kernel up to 4.2 kernels.
https://bugzilla.kernel.org/show_bug.cgi?id=81211

Revision history for this message
Alessandro Polverini (polve) wrote :

It happened to me with kernel 4.9.2

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.