iSCSI-target: Deleting a LUN hangs in the kernel

Bug #1862682 reported by Pavel Zakharov
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Eoan
Medium
Unassigned

Bug Description

The issue is that deleting a LUN on the target will hang until IO is attempted to this LUN from the initiator.

This issue was introduced when linux-image-generic-hwe-18.04 switched from the 5.0 to the 5.3 kernel. Source package is "linux-meta-hwe". OS is Ubuntu 18.04.4 LTS.

We can see a hung task message in dmesg:
[ 1330.438613] INFO: task iscsicmd:24572 blocked for more than 120 seconds.
[ 1330.439554] Tainted: P OE 5.3.0-26-generic #28~18.04.1-Ubuntu
[ 1330.440594] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.441666] iscsicmd D 0 24572 8910 0x00000080
[ 1330.441668] Call Trace:
[ 1330.441674] __schedule+0x2a8/0x670
[ 1330.441676] schedule+0x33/0xa0
[ 1330.441678] schedule_timeout+0x1d3/0x2f0
[ 1330.441682] ? __kfifo_to_user_r+0xb0/0xb0
[ 1330.441684] wait_for_completion+0xba/0x140
[ 1330.441688] ? wake_up_q+0x80/0x80
[ 1330.441704] transport_clear_lun_ref+0x27/0x30 [target_core_mod]
[ 1330.441711] core_tpg_remove_lun+0x35/0x100 [target_core_mod]
[ 1330.441716] core_dev_del_lun+0x26/0x70 [target_core_mod]
[ 1330.441721] target_fabric_port_unlink+0x4a/0x50 [target_core_mod]
[ 1330.441724] configfs_unlink+0xea/0x1b0
[ 1330.441727] vfs_unlink+0x111/0x200
[ 1330.441729] do_unlinkat+0x2ad/0x320
[ 1330.441731] __x64_sys_unlink+0x23/0x30
[ 1330.441734] do_syscall_64+0x5a/0x130
[ 1330.441736] entry_SYSCALL_64_after_hwframe+0x44/0xa9
(iscsicmd is a cli tool I created that is similar to targetcli)

You can find more details about this bug and the reproducer on this linux kernel discussion thread: https://lkml.org/lkml/2020/2/7/585

A patch has been already submitted by an iSCSI-target developer, although not yet integrated. (https://marc.info/?l=target-devel&m=158134893208641&w=2)

Would it be possible to get this patch integrated into the linux-image-generic-hwe-18.04 package, and if yes, then what would be the required next steps?

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-generic-hwe-18.04 5.3.0.28.96
ProcVersionSignature: Ubuntu 5.3.0-28.30~18.04.1-generic 5.3.13
Uname: Linux 5.3.0-28-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
Date: Mon Feb 10 18:41:12 2020
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-meta-hwe
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Pavel Zakharov (pzakha) wrote :
Brad Figg (brad-figg)
affects: linux-meta-hwe (Ubuntu) → linux-hwe (Ubuntu)
Revision history for this message
Pavel Zakharov (pzakha) wrote :

Added a patch for the upstream fix (upstream commit: c14335ebb92a98646ddbf447e6cacc66de5269ad).

tags: added: patch
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-hwe (Ubuntu):
status: New → Confirmed
Stefan Bader (smb)
affects: linux-hwe (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu Eoan):
importance: Undecided → Medium
status: New → Triaged
Stefan Bader (smb)
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Changed in linux (Ubuntu Eoan):
status: Triaged → Fix Committed
Revision history for this message
Pavel Zakharov (pzakha) wrote :

BTW, this patch has been accepted into the upstream 5.4-stable branch and is part of the next patch-set: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=91aa9e475827c5cd5f0283f3f68c7805882823fe,

Hopefully this means it should be pulled into Ubuntu Focal, right?

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-eoan' to 'verification-done-eoan'. If the problem still exists, change the tag 'verification-needed-eoan' to 'verification-failed-eoan'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-eoan
Revision history for this message
Khaled El Mously (kmously) wrote :

@Pavel: Yes, it should be making its way to Focal soon if it hasn't already. Thanks for pointing that out.

Revision history for this message
Khaled El Mously (kmously) wrote :

@pavel: Does the current -proposed eoan kernel resolve this issue for you?

Revision history for this message
Pavel Zakharov (pzakha) wrote :

Hi Khaled, sorry I missed the notifications. I have not tested the -proposed version, however I've already tested the fix by rebuilding on top of Ubuntu bionic hwe, so as long as the patch is there, we can count this as fixed.

Revision history for this message
Khaled El Mously (kmously) wrote :

Hi @pavel. No worries! Thanks for that info - that logic sounds alright to me.

Marking as verified in eoan.

tags: added: verification-done-eoan
removed: verification-needed-eoan
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (49.1 KiB)

This bug was fixed in the package linux - 5.3.0-46.38

---------------
linux (5.3.0-46.38) eoan; urgency=medium

  * eoan/linux: 5.3.0-43.36 -proposed tracker (LP: #1867301)

  * Fix AMD Stoney Ridge screen flickering under 4K resolution (LP: #1864005)
    - iommu/amd: Disable IOMMU on Stoney Ridge systems

  * Allow BPF tracing under lockdown (LP: #1868626)
    - Revert "UBUNTU: SAUCE: (efi-lockdown) Lock down kprobes"
    - Revert "bpf: Restrict bpf when kernel lockdown is in confidentiality mode"

  * Missing wireless network interface after kernel 5.3.0-43 upgrade with eoan
    (LP: #1868442)
    - iwlwifi: mvm: Do not require PHY_SKU NVM section for 3168 devices

  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis
    - [Packaging] update helper scripts

  * iSCSI-target: Deleting a LUN hangs in the kernel (LP: #1862682)
    - scsi: Revert "target/core: Inline transport_lun_remove_cmd()"

  * Stop using get_scalar_status command in Dell AIO uart backlight driver
    (LP: #1865402)
    - SAUCE: platform/x86: dell-uart-backlight: add get_display_mode command

  * Eoan update: upstream stable patchset 2020-03-11 (LP: #1867051)
    - Revert "drm/sun4i: dsi: Change the start delay calculation"
    - ovl: fix lseek overflow on 32bit
    - kernel/module: Fix memleak in module_add_modinfo_attrs()
    - media: iguanair: fix endpoint sanity check
    - ocfs2: fix oops when writing cloned file
    - x86/cpu: Update cached HLE state on write to TSX_CTRL_CPUID_CLEAR
    - udf: Allow writing to 'Rewritable' partitions
    - printk: fix exclusive_console replaying
    - iwlwifi: mvm: fix NVM check for 3168 devices
    - sparc32: fix struct ipc64_perm type definition
    - cls_rsvp: fix rsvp_policy
    - gtp: use __GFP_NOWARN to avoid memalloc warning
    - l2tp: Allow duplicate session creation with UDP
    - net: hsr: fix possible NULL deref in hsr_handle_frame()
    - net_sched: fix an OOB access in cls_tcindex
    - net: stmmac: Delete txtimer in suspend()
    - bnxt_en: Fix TC queue mapping.
    - tcp: clear tp->total_retrans in tcp_disconnect()
    - tcp: clear tp->delivered in tcp_disconnect()
    - tcp: clear tp->data_segs{in|out} in tcp_disconnect()
    - tcp: clear tp->segs_{in|out} in tcp_disconnect()
    - rxrpc: Fix use-after-free in rxrpc_put_local()
    - rxrpc: Fix insufficient receive notification generation
    - rxrpc: Fix missing active use pinning of rxrpc_local object
    - rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect
    - media: uvcvideo: Avoid cyclic entity chains due to malformed USB descriptors
    - mfd: dln2: More sanity checking for endpoints
    - ipc/msg.c: consolidate all xxxctl_down() functions
    - tracing: Fix sched switch start/stop refcount racy updates
    - rcu: Avoid data-race in rcu_gp_fqs_check_wake()
    - brcmfmac: Fix memory leak in brcmf_usbdev_qinit
    - usb: typec: tcpci: mask event interrupts when remove driver
    - usb: gadget: legacy: set max_speed to super-speed
    - usb: gadget: f_ncm: Use atomic_t to track in-flight request
    - usb: gadget: f_ecm: Use atomic_t to track in-flight request
    - ALSA: usb-audio: Fix endianess in descriptor validatio...

Changed in linux (Ubuntu Eoan):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers