unregister_netdevice: waiting for eth0 to become free. Usage count = 5

Bug #1746474 reported by Eyal Birger on 2018-01-31
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Joseph Salisbury
Artful
Medium
Joseph Salisbury
Bionic
Medium
Joseph Salisbury

Bug Description

== SRU Justification ==
Commit 52df157f17e5 introduced a regression in v4.13-rc1. This regression
causes a stack trace to occur when tearing down an LXD container. The process
hangs with the following message:

"unregister_netdevice: waiting for eth0 to become free. Usage count = 5"

This regression is fixed by commit 510c321b5571, which is in mainline as of
v4.16-rc7. The fix is needed in Artful and Bionic. However, Artful needs
a prereq commit, so it's SRU request will be sent separately.

== Fix ==
510c321b5571 ("xfrm: reuse uncached_list to track xdsts")

== Regression Potential ==
Low. This commit is to fix a current regression.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

This occurs when tearing down an LXD container.

LXD monitor process hangs with the following stack:

$ sudo cat /proc/27043/stack
[<ffffffff8af0022e>] msleep+0x2e/0x40
[<ffffffff8b5dc9cf>] netdev_run_todo+0x11f/0x310
[<ffffffff8b5e9d4d>] rtnetlink_rcv+0x2d/0x30
[<ffffffff8b612cdc>] netlink_unicast+0x18c/0x240
[<ffffffff8b61306d>] netlink_sendmsg+0x2dd/0x3c0
[<ffffffff8b5b8f08>] sock_sendmsg+0x38/0x50
[<ffffffff8b5b99c3>] ___sys_sendmsg+0x2e3/0x2f0
[<ffffffff8b5ba354>] __sys_sendmsg+0x54/0x90
[<ffffffff8b5ba3a2>] SyS_sendmsg+0x12/0x20
[<ffffffff8ae03a7b>] do_syscall_64+0x5b/0xc0
[<ffffffff8b800257>] entry_SYSCALL64_slow_path+0x8/0x8
[<ffffffffffffffff>] 0xffffffffffffffff

Issue submitted to LXD as well ([1]), though as indicated there, it seems to be a kernel bug.

[1] https://github.com/lxc/lxd/issues/4208

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.13.0-32-generic 4.13.0-32.35~16.04.1
ProcVersionSignature: Ubuntu 4.13.0-32.35~16.04.1-generic 4.13.13
Uname: Linux 4.13.0-32-generic x86_64
NonfreeKernelModules: zfs zunicode zavl zcommon znvpair
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: amd64
Date: Wed Jan 31 11:42:59 2018
InstallationDate: Installed on 2016-11-09 (447 days ago)
InstallationMedia: Ubuntu 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719)
SourcePackage: linux-hwe
UpgradeStatus: No upgrade log present (probably fresh install)

Eyal Birger (ebirger) wrote :
Eyal Birger (ebirger) wrote :

Fix was submitted to IPSec tree, see:
https://patchwork.ozlabs.org/patch/873321/

Would love for this to reach ubuntu release :)

affects: linux-hwe (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Artful):
importance: Undecided → Medium
status: New → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
tags: added: artful bionic
Joseph Salisbury (jsalisbury) wrote :

I built a Bionic test kernel with commit 510c321. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1746474/bionic

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Artful requires some back porting of that commit, so I'll perform the back port and post that kernel shortly.

Thanks in advance!

Eyal Birger (ebirger) wrote :

Hi,

I can confirm that the problem doesn't occur in the test kernel.

Thanks!

Joseph Salisbury (jsalisbury) wrote :

I also built an Artful test kernel with a back port of commit 510c321, which required commit 9620fef27ed2 as a prereq.

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1746474/artful

Can you test this kernel and see if it resolves this bug?

Eyal Birger (ebirger) wrote :

the problem also doesn't occur on the Artful test kernel.

Thanks!

Eyal Birger (ebirger) wrote :

for your records I've added the test script i use.

description: updated
Joseph Salisbury (jsalisbury) wrote :

I built one more Atrful test kernel with an update to the patch requested in the SRU feedback.

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1746474/artful

Can you test this kernel and see if it resolves this bug?

Eyal Birger (ebirger) wrote :

Latest Artful test kernel resolves the bug as well. Thanks!

Joseph Salisbury (jsalisbury) wrote :

I built a v3 Artful test kernel with an update to the patch requested in the SRU feedback.

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1746474/artful

Can you also test this kernel and see if it resolves this bug?

Eyal Birger (ebirger) wrote :

This kernel is fine too.

Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Stefan Bader (smb) on 2018-05-23
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-artful
Eyal Birger (ebirger) on 2018-05-27
tags: added: verification-fixed-artful
removed: verification-needed-artful
tags: added: verification-done-artful
removed: verification-fixed-artful
Eyal Birger (ebirger) on 2018-05-27
tags: added: verification-done-bionic
removed: verification-needed-bionic
Eyal Birger (ebirger) wrote :

Kernel on -proposed resolves cases for both bionic and artful.

Launchpad Janitor (janitor) wrote :
Download full text (11.4 KiB)

This bug was fixed in the package linux - 4.15.0-23.25

---------------
linux (4.15.0-23.25) bionic; urgency=medium

  * linux: 4.15.0-23.25 -proposed tracker (LP: #1772927)

  * arm64 SDEI support needs trampoline code for KPTI (LP: #1768630)
    - arm64: mmu: add the entry trampolines start/end section markers into
      sections.h
    - arm64: sdei: Add trampoline code for remapping the kernel

  * Some PCIe errors not surfaced through rasdaemon (LP: #1769730)
    - ACPI: APEI: handle PCIe AER errors in separate function
    - ACPI: APEI: call into AER handling regardless of severity

  * qla2xxx: Fix page fault at kmem_cache_alloc_node() (LP: #1770003)
    - scsi: qla2xxx: Fix session cleanup for N2N
    - scsi: qla2xxx: Remove unused argument from qlt_schedule_sess_for_deletion()
    - scsi: qla2xxx: Serialize session deletion by using work_lock
    - scsi: qla2xxx: Serialize session free in qlt_free_session_done
    - scsi: qla2xxx: Don't call dma_free_coherent with IRQ disabled.
    - scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
    - scsi: qla2xxx: Prevent relogin trigger from sending too many commands
    - scsi: qla2xxx: Fix double free bug after firmware timeout
    - scsi: qla2xxx: Fixup locking for session deletion

  * Several hisi_sas bug fixes (LP: #1768974)
    - scsi: hisi_sas: dt-bindings: add an property of signal attenuation
    - scsi: hisi_sas: support the property of signal attenuation for v2 hw
    - scsi: hisi_sas: fix the issue of link rate inconsistency
    - scsi: hisi_sas: fix the issue of setting linkrate register
    - scsi: hisi_sas: increase timer expire of internal abort task
    - scsi: hisi_sas: remove unused variable hisi_sas_devices.running_req
    - scsi: hisi_sas: fix return value of hisi_sas_task_prep()
    - scsi: hisi_sas: Code cleanup and minor bug fixes

  * [bionic] machine stuck and bonding not working well when nvmet_rdma module
    is loaded (LP: #1764982)
    - nvmet-rdma: Don't flush system_wq by default during remove_one
    - nvme-rdma: Don't flush delete_wq by default during remove_one

  * Warnings/hang during error handling of SATA disks on SAS controller
    (LP: #1768971)
    - scsi: libsas: defer ata device eh commands to libata

  * Hotplugging a SATA disk into a SAS controller may cause crash (LP: #1768948)
    - ata: do not schedule hot plug if it is a sas host

  * ISST-LTE:pKVM:Ubuntu1804: rcu_sched self-detected stall on CPU follow by CPU
    ATTEMPT TO RE-ENTER FIRMWARE! (LP: #1767927)
    - powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
    - powerpc/64s: return more carefully from sreset NMI
    - powerpc/64s: sreset panic if there is no debugger or crash dump handlers

  * fsnotify: Fix fsnotify_mark_connector race (LP: #1765564)
    - fsnotify: Fix fsnotify_mark_connector race

  * Hang on network interface removal in Xen virtual machine (LP: #1771620)
    - xen-netfront: Fix hang on device removal

  * HiSilicon HNS NIC names are truncated in /proc/interrupts (LP: #1765977)
    - net: hns: Avoid action name truncation

  * Ubuntu 18.04 kernel crashed while in degraded mode (LP: #1770849)
    - SAUCE: powerpc/perf: Fix memory allocation for...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (4.3 KiB)

This bug was fixed in the package linux - 4.13.0-45.50

---------------
linux (4.13.0-45.50) artful; urgency=medium

  * linux: 4.13.0-45.50 -proposed tracker (LP: #1774124)

  * CVE-2018-3639 (x86)
    - SAUCE: Set generic SSBD feature for Intel cpus

linux (4.13.0-44.49) artful; urgency=medium

  * linux: 4.13.0-44.49 -proposed tracker (LP: #1772951)

  * CVE-2018-3639 (x86)
    - x86/cpu: Make alternative_msr_write work for 32-bit code
    - x86/cpu/AMD: Fix erratum 1076 (CPB bit)
    - x86/bugs: Fix the parameters alignment and missing void
    - KVM: SVM: Move spec control call after restore of GS
    - x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
    - x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
    - x86/cpufeatures: Disentangle SSBD enumeration
    - x86/cpufeatures: Add FEATURE_ZEN
    - x86/speculation: Handle HT correctly on AMD
    - x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
    - x86/speculation: Add virtualized speculative store bypass disable support
    - x86/speculation: Rework speculative_store_bypass_update()
    - x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
    - x86/bugs: Expose x86_spec_ctrl_base directly
    - x86/bugs: Remove x86_spec_ctrl_set()
    - x86/bugs: Rework spec_ctrl base and mask logic
    - x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
    - KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
    - x86/bugs: Rename SSBD_NO to SSB_NO
    - KVM: VMX: Expose SSBD properly to guests.

  * [Ubuntu 16.04] kernel: fix rwlock implementation (LP: #1761674)
    - SAUCE: (no-up) s390: fix rwlock implementation

  * CVE-2018-7492
    - rds: Fix NULL pointer dereference in __rds_rdma_map

  * CVE-2018-8781
    - drm: udl: Properly check framebuffer mmap offsets

  * fsnotify: Fix fsnotify_mark_connector race (LP: #1765564)
    - fsnotify: Fix fsnotify_mark_connector race

  * Kernel panic on boot (m1.small in cn-north-1) (LP: #1771679)
    - x86/xen: Reset VCPU0 info pointer after shared_info remap

  * Suspend to idle: Open lid didn't resume (LP: #1771542)
    - ACPI / PM: Do not reconfigure GPEs for suspend-to-idle

  * CVE-2018-1092
    - ext4: fail ext4_iget for root directory if unallocated

  * [SRU][Artful] using vfio-pci on a combination of cn8xxx and some PCI devices
    results in a kernel panic. (LP: #1770254)
    - PCI: Avoid bus reset if bridge itself is broken
    - PCI: Mark Cavium CN8xxx to avoid bus reset
    - PCI: Avoid slot reset if bridge itself is broken

  * Battery drains when laptop is off (shutdown) (LP: #1745646)
    - PCI / PM: Check device_may_wakeup() in pci_enable_wake()

  * perf record crash: refcount_inc assertion failed (LP: #1769027)
    - perf cgroup: Fix refcount usage
    - perf xyarray: Fix wrong processing when closing evsel fd

  * Dell Latitude 5490/5590 BIOS update 1.1.9 causes black screen at boot
    (LP: #1764194)
    - drm/i915/bios: filter out invalid DDC pins from VBT child devices

  * Fix an issue that some PCI devices get incorrectly suspended (LP: #1764684)
    - PCI / PM: Always check PME wakeup capability for runtime wakeup support

  * [SRU][Bionic/Artful] fix false positives in W...

Read more...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (11.4 KiB)

This bug was fixed in the package linux - 4.15.0-23.25

---------------
linux (4.15.0-23.25) bionic; urgency=medium

  * linux: 4.15.0-23.25 -proposed tracker (LP: #1772927)

  * arm64 SDEI support needs trampoline code for KPTI (LP: #1768630)
    - arm64: mmu: add the entry trampolines start/end section markers into
      sections.h
    - arm64: sdei: Add trampoline code for remapping the kernel

  * Some PCIe errors not surfaced through rasdaemon (LP: #1769730)
    - ACPI: APEI: handle PCIe AER errors in separate function
    - ACPI: APEI: call into AER handling regardless of severity

  * qla2xxx: Fix page fault at kmem_cache_alloc_node() (LP: #1770003)
    - scsi: qla2xxx: Fix session cleanup for N2N
    - scsi: qla2xxx: Remove unused argument from qlt_schedule_sess_for_deletion()
    - scsi: qla2xxx: Serialize session deletion by using work_lock
    - scsi: qla2xxx: Serialize session free in qlt_free_session_done
    - scsi: qla2xxx: Don't call dma_free_coherent with IRQ disabled.
    - scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
    - scsi: qla2xxx: Prevent relogin trigger from sending too many commands
    - scsi: qla2xxx: Fix double free bug after firmware timeout
    - scsi: qla2xxx: Fixup locking for session deletion

  * Several hisi_sas bug fixes (LP: #1768974)
    - scsi: hisi_sas: dt-bindings: add an property of signal attenuation
    - scsi: hisi_sas: support the property of signal attenuation for v2 hw
    - scsi: hisi_sas: fix the issue of link rate inconsistency
    - scsi: hisi_sas: fix the issue of setting linkrate register
    - scsi: hisi_sas: increase timer expire of internal abort task
    - scsi: hisi_sas: remove unused variable hisi_sas_devices.running_req
    - scsi: hisi_sas: fix return value of hisi_sas_task_prep()
    - scsi: hisi_sas: Code cleanup and minor bug fixes

  * [bionic] machine stuck and bonding not working well when nvmet_rdma module
    is loaded (LP: #1764982)
    - nvmet-rdma: Don't flush system_wq by default during remove_one
    - nvme-rdma: Don't flush delete_wq by default during remove_one

  * Warnings/hang during error handling of SATA disks on SAS controller
    (LP: #1768971)
    - scsi: libsas: defer ata device eh commands to libata

  * Hotplugging a SATA disk into a SAS controller may cause crash (LP: #1768948)
    - ata: do not schedule hot plug if it is a sas host

  * ISST-LTE:pKVM:Ubuntu1804: rcu_sched self-detected stall on CPU follow by CPU
    ATTEMPT TO RE-ENTER FIRMWARE! (LP: #1767927)
    - powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
    - powerpc/64s: return more carefully from sreset NMI
    - powerpc/64s: sreset panic if there is no debugger or crash dump handlers

  * fsnotify: Fix fsnotify_mark_connector race (LP: #1765564)
    - fsnotify: Fix fsnotify_mark_connector race

  * Hang on network interface removal in Xen virtual machine (LP: #1771620)
    - xen-netfront: Fix hang on device removal

  * HiSilicon HNS NIC names are truncated in /proc/interrupts (LP: #1765977)
    - net: hns: Avoid action name truncation

  * Ubuntu 18.04 kernel crashed while in degraded mode (LP: #1770849)
    - SAUCE: powerpc/perf: Fix memory allocation for...

Changed in linux (Ubuntu):
status: In Progress → Fix Released

FYI I'm still seeing this on Ubuntu 18.04 running on AWS:

# uname -a
Linux uni09.sys.timedoctor.com 4.15.0-1016-aws #16-Ubuntu SMP Wed Jul 18 09:20:54 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

When running:

lxc delete uni09-2018-08-02-08-01-46 --force

It very often ends up in ERROR state, with "unregister_netdevice: waiting for eth0 to become free. Usage count = 1" being added to dmesg.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers