systemd-udevd hung in blk_mq_freeze_queue_wait testing unpartitioned NVMe drive

Bug #1662673 reported by bugproxy
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Tim Gardner
Yakkety
Fix Released
Undecided
Tim Gardner
Zesty
Fix Released
Undecided
Unassigned

Bug Description

For reference, here is the stack of systemd-udevd seen in the hang:

[ 1558.214013] INFO: task systemd-udevd:1778 blocked for more than 120 seconds.
[ 1558.214318] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1558.214556] systemd-udevd D 00003fff8dbdf7a0 0 1778 1 0x00040000
[ 1558.214637] Call Trace:
[ 1558.214673] [c000000004ad3790] [c0000000007aac20] schedule_timeout+0x180/0x2f0 (unreliable)
[ 1558.214779] [c000000004ad3960] [c0000000000158d0] __switch_to+0x200/0x350
[ 1558.214870] [c000000004ad39c0] [c0000000007adbb4] __schedule+0x414/0x9e0
[ 1558.214961] [c000000004ad3a90] [c0000000003b4e54] blk_mq_freeze_queue_wait+0x64/0xd0
[ 1558.215107] [c000000004ad3af0] [d000000034011964] nvme_revalidate_disk+0xd4/0x3a0 [nvme]
[ 1558.215386] [c000000004ad3b90] [c0000000003c2398] rescan_partitions+0x98/0x390
[ 1558.215508] [c000000004ad3c60] [c0000000003bb7ac] __blkdev_reread_part+0x9c/0xd0
[ 1558.215599] [c000000004ad3c90] [c0000000003bb818] blkdev_reread_part+0x38/0x70
[ 1558.215935] [c000000004ad3cc0] [c0000000003bc334] blkdev_ioctl+0x3b4/0xb80
[ 1558.216016] [c000000004ad3d20] [c0000000002cbcd0] block_ioctl+0x70/0x90
[ 1558.216114] [c000000004ad3d40] [c000000000296b38] do_vfs_ioctl+0x458/0x740
[ 1558.216192] [c000000004ad3dd0] [c000000000296ee4] SyS_ioctl+0xc4/0xe0
[ 1558.216275] [c000000004ad3e30] [c00000000000a17c] system_call+0x38/0xb4

It appears that systemd-udevd is triggering every time HTX writes to the boot sector (partition table) of the raw drive, and this is causing the revalidate calls which expose the issue with the block driver mq freeze. With a partition table on each drive, HTX will no longer be writing the partition table and no longer triggering systemd to re-read the partition table and try to freeze I/O.

The fix for this is provided by the following upstream commit:

966d2b0 percpu-refcount: fix reference leak during percpu-atomic transition

which needs to be pulled into 16.04 (as well as newer releases).

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-148242 severity-critical targetmilestone-inin16042
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Michael Hohnbaum (hohnbaum) wrote : Re: [Bug 1662673] [NEW] systemd-udevd hung in blk_mq_freeze_queue_wait testing unpartitioned NVMe drive

Leann,

This looks like one for the Kernel team.

                   Michael

On 02/07/2017 01:19 PM, Launchpad Bug Tracker wrote:
> bugproxy (bugproxy) has assigned this bug to you for Ubuntu:
>
> For reference, here is the stack of systemd-udevd seen in the hang:
>
> [ 1558.214013] INFO: task systemd-udevd:1778 blocked for more than 120 seconds.
> [ 1558.214318] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1558.214556] systemd-udevd D 00003fff8dbdf7a0 0 1778 1 0x00040000
> [ 1558.214637] Call Trace:
> [ 1558.214673] [c000000004ad3790] [c0000000007aac20] schedule_timeout+0x180/0x2f0 (unreliable)
> [ 1558.214779] [c000000004ad3960] [c0000000000158d0] __switch_to+0x200/0x350
> [ 1558.214870] [c000000004ad39c0] [c0000000007adbb4] __schedule+0x414/0x9e0
> [ 1558.214961] [c000000004ad3a90] [c0000000003b4e54] blk_mq_freeze_queue_wait+0x64/0xd0
> [ 1558.215107] [c000000004ad3af0] [d000000034011964] nvme_revalidate_disk+0xd4/0x3a0 [nvme]
> [ 1558.215386] [c000000004ad3b90] [c0000000003c2398] rescan_partitions+0x98/0x390
> [ 1558.215508] [c000000004ad3c60] [c0000000003bb7ac] __blkdev_reread_part+0x9c/0xd0
> [ 1558.215599] [c000000004ad3c90] [c0000000003bb818] blkdev_reread_part+0x38/0x70
> [ 1558.215935] [c000000004ad3cc0] [c0000000003bc334] blkdev_ioctl+0x3b4/0xb80
> [ 1558.216016] [c000000004ad3d20] [c0000000002cbcd0] block_ioctl+0x70/0x90
> [ 1558.216114] [c000000004ad3d40] [c000000000296b38] do_vfs_ioctl+0x458/0x740
> [ 1558.216192] [c000000004ad3dd0] [c000000000296ee4] SyS_ioctl+0xc4/0xe0
> [ 1558.216275] [c000000004ad3e30] [c00000000000a17c] system_call+0x38/0xb4
>
> It appears that systemd-udevd is triggering every time HTX writes to the
> boot sector (partition table) of the raw drive, and this is causing the
> revalidate calls which expose the issue with the block driver mq freeze.
> With a partition table on each drive, HTX will no longer be writing the
> partition table and no longer triggering systemd to re-read the
> partition table and try to freeze I/O.
>
> The fix for this is provided by the following upstream commit:
>
> 966d2b0 percpu-refcount: fix reference leak during percpu-atomic
> transition
>
> which needs to be pulled into 16.04 (as well as newer releases).
>
> ** Affects: ubuntu
> Importance: Undecided
> Assignee: Taco Screen team (taco-screen-team)
> Status: New
>
>
> ** Tags: architecture-ppc64le bugnameltc-148242 severity-critical targetmilestone-inin16042

--
Michael Hohnbaum
OIL Program Manager
Power (ppc64el) Development Project Manager
Canonical, Ltd.

Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Changed in linux (Ubuntu Yakkety):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Changed in linux (Ubuntu Zesty):
assignee: Taco Screen team (taco-screen-team) → nobody
status: New → Fix Released
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Revision history for this message
Tim Gardner (timg-tpi) wrote :

Came in via v4.4.48

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Released
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-yakkety
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-02-27 16:29 EDT-------
Verified, kernel 4.8.0-40-generic fixes this problem.

Tim Gardner (timg-tpi)
tags: added: verification-done-yakkety
removed: verification-needed-yakkety
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (6.0 KiB)

This bug was fixed in the package linux - 4.8.0-40.43

---------------
linux (4.8.0-40.43) yakkety; urgency=low

  * linux: 4.8.0-40.43 -proposed tracker (LP: #1667066)

  [ Andy Whitcroft ]
  * NFS client : permission denied when trying to access subshare, since kernel
    4.4.0-31 (LP: #1649292)
    - fs: Better permission checking for submounts

  * shaking screen (LP: #1651981)
    - drm/radeon: drop verde dpm quirks

  * [0bda:0328] Card reader failed after S3 (LP: #1664809)
    - usb: hub: Wait for connection to be reestablished after port reset

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

  * In Ubuntu 17.04 : after reboot getting message in console like Unable to
    open file: /etc/keys/x509_ima.der (-2) (LP: #1656908)
    - SAUCE: ima: Downgrade error to warning

  * 16.04.2: Extra patches for POWER9 (LP: #1664564)
    - powerpc/mm: Fix no execute fault handling on pre-POWER5
    - powerpc/mm: Fix spurrious segfaults on radix with autonuma

  * ibmvscsis: Add SGL LIMIT (LP: #1662551)
    - ibmvscsis: Add SGL limit

  * [Hyper-V] Bug fixes for storvsc (tagged queuing, error conditions)
    (LP: #1663687)
    - scsi: storvsc: Enable tracking of queue depth
    - scsi: storvsc: Remove the restriction on max segment size
    - scsi: storvsc: Enable multi-queue support
    - scsi: storvsc: use tagged SRB requests if supported by the device
    - scsi: storvsc: properly handle SRB_ERROR when sense message is present
    - scsi: storvsc: properly set residual data length on errors

  * Ubuntu16.10-KVM:Big configuration with multiple guests running SRIOV VFs
    caused KVM host hung and all KVM guests down. (LP: #1651248)
    - KVM: PPC: Book 3S: XICS cleanup: remove XICS_RM_REJECT
    - KVM: PPC: Book 3S: XICS: correct the real mode ICP rejecting counter
    - KVM: PPC: Book 3S: XICS: Fix potential issue with duplicate IRQ resends
    - KVM: PPC: Book 3S: XICS: Implement ICS P/Q states
    - KVM: PPC: Book 3S: XICS: Don't lock twice when checking for resend

  * ISST-LTE:pNV: ppc64_cpu command is hung w HDs, SSDs and NVMe (LP: #1662666)
    - blk-mq: Avoid memory reclaim when remapping queues
    - blk-mq: Fix failed allocation path when mapping queues
    - blk-mq: Always schedule hctx->next_cpu

  * systemd-udevd hung in blk_mq_freeze_queue_wait testing unpartitioned NVMe
    drive (LP: #1662673)
    - percpu-refcount: fix reference leak during percpu-atomic transition

  * [Yakkety SRU] Enable KEXEC support in ARM64 kernel (LP: #1662554)
    - [Config] Enable KEXEC support in ARM64.

  * [Hyper-V] Fix ring buffer handling to avoid host throttling (LP: #1661430)
    - Drivers: hv: vmbus: On write cleanup the logic to interrupt the host
    - Drivers: hv: vmbus: On the read path cleanup the logic to interrupt the host
    - Drivers: hv: vmbus: finally fix hv_need_to_signal_on_read()

  * brd module compiled as built-in (LP: #1593293)
    - CONFIG_BLK_DEV_RAM=m

  * regession tests failing after stackprofile test is run (LP: #1661030)
    - SAUCE: fix regression with domain change in compla...

Read more...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
Steven Haber (sthaber) wrote :

This bug is hitting for me on 16.04 LTS running kernel 4.13.0-16. udev is stuck in the following stack:

[<ffffffff906309eb>] blk_mq_freeze_queue_wait+0x4b/0xb0
[<ffffffff90631f4a>] blk_mq_freeze_queue+0x1a/0x20
[<ffffffffc03d676a>] __nvme_revalidate_disk+0x7a/0x3f0 [nvme_core]
[<ffffffffc03d7bc3>] nvme_revalidate_disk+0x53/0x90 [nvme_core]
[<ffffffff9063b72d>] rescan_partitions+0x8d/0x330
[<ffffffff906374f5>] __blkdev_reread_part+0x65/0x70
[<ffffffff90637523>] blkdev_reread_part+0x23/0x40
[<ffffffff90637ef7>] blkdev_ioctl+0x387/0x910
[<ffffffff9049253d>] block_ioctl+0x3d/0x50
[<ffffffff90467521>] do_vfs_ioctl+0xa1/0x5f0
[<ffffffff90467ae9>] SyS_ioctl+0x79/0x90
[<ffffffff90b0edfb>] entry_SYSCALL_64_fastpath+0x1e/0xa9
[<ffffffffffffffff>] 0xffffffffffffffff

And the process info:

4 D root 797 1 0 80 0 - 11661 blk_mq 03:04 ? 00:00:02 /lib/systemd/systemd-udevd

We have a bunch of read-only parted jobs backing up behind the kernel hang (and possibly causing it in the first place):
root 17317 1 0 03:17 ? 00:00:00 /sbin/parted.rw -m -s -- /dev/nvme0n1 unit B print
root 36839 36832 0 05:39 ? 00:00:00 /sbin/parted.rw -m -s -- /dev/nvme0n1 unit B print
root 37181 37143 0 05:50 ? 00:00:00 /sbin/blockdev --getsize64 /dev/nvme0n1
root 37340 37333 0 06:00 ? 00:00:00 /sbin/parted.rw -m -s -- /dev/nvme0n1 unit B print
root 38585 38549 0 08:29 ? 00:00:00 /sbin/blockdev --getsize64 /dev/nvme0n1
root 38742 38735 0 08:39 ? 00:00:00 /sbin/parted.rw -m -s -- /dev/nvme0n1 unit B print
root 40022 39986 0 11:14 ? 00:00:00 /sbin/blockdev --getsize64 /dev/nvme0n1
root 40184 40177 0 11:24 ? 00:00:00 /sbin/parted.rw -m -s -- /dev/nvme0n1 unit B print
root 41456 41419 0 13:59 ? 00:00:00 /sbin/blockdev --getsize64 /dev/nvme0n1
root 41615 41608 0 14:09 ? 00:00:00 /sbin/parted.rw -m -s -- /dev/nvme0n1 unit B print
root 42905 42869 0 16:44 ? 00:00:00 /sbin/blockdev --getsize64 /dev/nvme0n1
root 43062 43054 0 16:54 ? 00:00:00 /sbin/parted.rw -m -s -- /dev/nvme0n1 unit B print

These are NVME drives with a GPT and two partitions. Let me know if you need more info.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.