I created a new Jammy VM, and installed qemu-utils.
The kernel is 5.15.0-41-generic from -updates.
I ran my reproducer.sh script from the testcase, and within a minute, the nbd request got stuck, and we started seeing hung task timeout oops messages in dmesg:
Performing verification for Jammy.
I created a new Jammy VM, and installed qemu-utils.
The kernel is 5.15.0-41-generic from -updates.
I ran my reproducer.sh script from the testcase, and within a minute, the nbd request got stuck, and we started seeing hung task timeout oops messages in dmesg:
Jul 20 04:56:20 jammy-nbd kernel: block nbd15: NBD_DISCONNECT sudo:session) : session closed for user root /usr/bin/ qemu-nbd --connect= /dev/nbd15 --cache=writeback --format=qcow2 foo.img sudo:session) : session opened for user root(uid=0) by ubuntu(uid=1000) partition_ table() : Disk read failed. ,65536B) . Runtime 30 seconds ,65536B) . Runtime 60 seconds ,65536B) . Runtime 90 seconds ,65536B) . Runtime 120 seconds kernel/ hung_task_ timeout_ secs" disables this message. 0x23d/0x590 freeze_ queue_wait+ 0x69/0xa0 0x70/0x70 freeze_ queue+0x1b/ 0x30 socket+ 0x76/0x1f0 [nbd] 0x18b/0x340 [nbd] capable+ 0x3d/0x60 ioctl+0x12e/ 0x270 files+0x86/ 0xc0 0x46/0x50 ioctl+0x91/ 0xc0 64+0x5c/ 0xc0 user_mode_ prepare+ 0x37/0xb0 exit_to_ user_mode+ 0x27/0x50 recvmsg+ 0x1d/0x20 64+0x69/ 0xc0 exit_to_ user_mode+ 0x27/0x50 recvmsg+ 0x1d/0x20 64+0x69/ 0xc0 64_after_ hwframe+ 0x44/0xae 4d1820 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jul 20 04:56:20 jammy-nbd kernel: block nbd15: Send disconnect failed -32
Jul 20 04:56:20 jammy-nbd sudo[5267]: pam_unix(
Jul 20 04:56:20 jammy-nbd sudo[5271]: ubuntu : TTY=pts/0 ; PWD=/home/ubuntu ; USER=root ; COMMAND=
Jul 20 04:56:20 jammy-nbd sudo[5271]: pam_unix(
Jul 20 04:56:20 jammy-nbd kernel: ldm_validate_
Jul 20 04:56:20 jammy-nbd kernel: Dev nbd15: unable to read RDB block 0
Jul 20 04:56:20 jammy-nbd kernel: nbd15: unable to read partition table
Jul 20 04:56:51 jammy-nbd kernel: block nbd15: Possible stuck request 0000000064946bb4: control (read@524087296
Jul 20 04:57:19 jammy-nbd systemd-udevd[440]: nbd15: Worker [2561] processing SEQNUM=3062 is taking a long time
Jul 20 04:57:21 jammy-nbd kernel: block nbd15: Possible stuck request 0000000064946bb4: control (read@524087296
Jul 20 04:57:52 jammy-nbd kernel: block nbd15: Possible stuck request 0000000064946bb4: control (read@524087296
Jul 20 04:58:23 jammy-nbd kernel: block nbd15: Possible stuck request 0000000064946bb4: control (read@524087296
Jul 20 04:58:23 jammy-nbd kernel: INFO: task qemu-nbd:5280 blocked for more than 120 seconds.
Jul 20 04:58:23 jammy-nbd kernel: Not tainted 5.15.0-41-generic #44-Ubuntu
Jul 20 04:58:23 jammy-nbd kernel: "echo 0 > /proc/sys/
Jul 20 04:58:23 jammy-nbd kernel: task:qemu-nbd state:D stack: 0 pid: 5280 ppid: 1 flags:0x00000002
Jul 20 04:58:23 jammy-nbd kernel: Call Trace:
Jul 20 04:58:23 jammy-nbd kernel: <TASK>
Jul 20 04:58:23 jammy-nbd kernel: __schedule+
Jul 20 04:58:23 jammy-nbd kernel: ? call_rcu+0xe/0x10
Jul 20 04:58:23 jammy-nbd kernel: schedule+0x4e/0xb0
Jul 20 04:58:23 jammy-nbd kernel: blk_mq_
Jul 20 04:58:23 jammy-nbd kernel: ? wait_woken+
Jul 20 04:58:23 jammy-nbd kernel: blk_mq_
Jul 20 04:58:23 jammy-nbd kernel: nbd_add_
Jul 20 04:58:23 jammy-nbd kernel: __nbd_ioctl+
Jul 20 04:58:23 jammy-nbd kernel: ? security_
Jul 20 04:58:23 jammy-nbd kernel: nbd_ioctl+0x81/0xb0 [nbd]
Jul 20 04:58:23 jammy-nbd kernel: blkdev_
Jul 20 04:58:23 jammy-nbd kernel: ? __fget_
Jul 20 04:58:23 jammy-nbd kernel: block_ioctl+
Jul 20 04:58:23 jammy-nbd kernel: __x64_sys_
Jul 20 04:58:23 jammy-nbd kernel: do_syscall_
Jul 20 04:58:23 jammy-nbd kernel: ? exit_to_
Jul 20 04:58:23 jammy-nbd kernel: ? syscall_
Jul 20 04:58:23 jammy-nbd kernel: ? __x64_sys_
Jul 20 04:58:23 jammy-nbd kernel: ? do_syscall_
Jul 20 04:58:23 jammy-nbd kernel: ? syscall_
Jul 20 04:58:23 jammy-nbd kernel: ? __x64_sys_
Jul 20 04:58:23 jammy-nbd kernel: ? do_syscall_
Jul 20 04:58:23 jammy-nbd kernel: entry_SYSCALL_
Jul 20 04:58:23 jammy-nbd kernel: RIP: 0033:0x7f6c47e47aff
Jul 20 04:58:23 jammy-nbd kernel: RSP: 002b:00007f6c46
Jul 20 04:58:23 jammy-nbd kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f6c47e47aff
Jul 20 04:58:23 jammy-nbd kernel: RDX: 0000000000000009 RSI: 000000000000ab00 RDI: 000000000000000b
Jul 20 04:58:23 jammy-nbd kernel: RBP: 00007f6c464d1910 R08: 0000000000000000 R09: 0000000000000001
Jul 20 04:58:23 jammy-nbd kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000b
Jul 20 04:58:23 jammy-nbd kernel: R13: 00007f6c464d1900 R14: 000000001f400000 R15: 00007f6c3c000b90
Jul 20 04:58:23 jammy-nbd kernel: </TASK>
I then rebooted, and enabled -proposed and installed the 5.15.0-43-generic kernel.
I started the reproducer.sh script and left it to run for an hour.
At the end of the hour, the script was still running strong. Requests no longer get stuck when we issue NBD_DISCONNECT, and the issue is solved.
The kernel in -proposed fixes the issue, happy to mark verified.