Activity log for bug #1980925

Date Who What changed Old value New value Message
2022-07-07 07:13:40 nikhil kshirsagar bug added bug
2022-07-07 07:15:27 nikhil kshirsagar linux (Ubuntu): milestone jammy-updates
2022-07-07 07:16:13 nikhil kshirsagar nominated for series Ubuntu Focal
2022-07-07 07:16:13 nikhil kshirsagar bug task added linux (Ubuntu Focal)
2022-07-07 07:16:13 nikhil kshirsagar nominated for series Ubuntu Jammy
2022-07-07 07:16:13 nikhil kshirsagar bug task added linux (Ubuntu Jammy)
2022-07-07 07:17:20 nikhil kshirsagar linux (Ubuntu): milestone jammy-updates
2022-07-07 07:30:08 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2022-07-07 07:30:10 Ubuntu Kernel Bot linux (Ubuntu Focal): status New Incomplete
2022-07-07 07:30:12 Ubuntu Kernel Bot linux (Ubuntu Jammy): status New Incomplete
2022-07-11 17:07:45 nikhil kshirsagar linux (Ubuntu): status Incomplete Confirmed
2022-07-11 17:07:48 nikhil kshirsagar linux (Ubuntu Focal): status Incomplete Confirmed
2022-07-11 17:07:50 nikhil kshirsagar linux (Ubuntu Jammy): status Incomplete Confirmed
2022-07-15 16:05:42 Brett Milford bug added subscriber Brett Milford
2022-07-27 07:48:29 nikhil kshirsagar description SRU Justification: [Impact] This bug in bcache affects focal and jammy releases. When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic, and is fixed through upstream commit commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800 bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() The local variables check_state (in bch_btree_check()) and state (in bch_sectors_dirty_init()) should be fully filled by 0, because before allocating them on stack, they were dynamically allocated by kzalloc(). Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> [ 4474.731735] INFO: task fio:16626 blocked for more than 121 seconds. [ 4475.035858] Not tainted 5.15.50-051550-generic #202206251445 [ 4475.335859] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4475.716526] task:fio state:D stack: 0 pid:16626 ppid: 1715 flags:0x00004002 [ 4475.716531] Call Trace: [ 4475.716546] <TASK> [ 4475.716549] __schedule+0x23d/0x5a0 [ 4475.716555] ? sysvec_apic_timer_interrupt+0x4e/0x90 [ 4475.716560] schedule+0x4e/0xb0 [ 4475.716563] rwsem_down_read_slowpath+0x32e/0x380 [ 4475.716567] down_read+0x43/0x90 [ 4475.716571] cached_dev_write+0x7e/0x480 [bcache] [ 4475.716604] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4475.716618] __submit_bio+0x1a1/0x220 [ 4475.716623] __submit_bio_noacct+0x85/0x1f0 [ 4475.716626] submit_bio_noacct+0x4e/0x120 [ 4475.716628] submit_bio+0x4a/0x130 [ 4475.716631] iomap_submit_ioend+0x53/0x80 [ 4475.716634] iomap_writepages+0x35/0x40 [ 4475.716636] xfs_vm_writepages+0x84/0xb0 [xfs] [ 4475.716730] do_writepages+0xda/0x200 [ 4475.716736] filemap_fdatawrite_wbc+0x81/0xd0 [ 4475.716739] file_write_and_wait_range+0xac/0xf0 [ 4475.716742] xfs_file_fsync+0x5b/0x250 [xfs] [ 4475.716821] vfs_fsync_range+0x49/0x80 [ 4475.716826] ? __fget_light+0x32/0x80 [ 4475.716829] __x64_sys_fsync+0x38/0x60 [ 4475.716832] do_syscall_64+0x5c/0xc0 [ 4475.716836] ? ksys_write+0xce/0xe0 [ 4475.716838] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4475.716843] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716847] ? __x64_sys_write+0x19/0x20 [ 4475.716849] ? do_syscall_64+0x69/0xc0 [ 4475.716851] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716855] ? __x64_sys_write+0x19/0x20 [ 4475.716857] ? do_syscall_64+0x69/0xc0 [ 4475.716859] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4475.716862] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716865] ? __x64_sys_write+0x19/0x20 [ 4475.716867] ? do_syscall_64+0x69/0xc0 [ 4475.716869] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716873] ? __x64_sys_write+0x19/0x20 [ 4475.716875] ? do_syscall_64+0x69/0xc0 [ 4475.716877] ? do_syscall_64+0x69/0xc0 [ 4475.716879] ? asm_common_interrupt+0x8/0x40 [ 4475.716883] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4475.716885] RIP: 0033:0x7fd3c498aa5b [ 4475.716888] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 4475.716890] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b [ 4475.716892] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006 [ 4475.716893] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40 [ 4475.716895] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000 [ 4475.716896] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000 [ 4475.716898] </TASK> [ 4475.716912] INFO: task kworker/26:87:16682 blocked for more than 122 seconds. [ 4476.063391] Not tainted 5.15.50-051550-generic #202206251445 [ 4476.363476] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4476.743988] task:kworker/26:87 state:D stack: 0 pid:16682 ppid: 2 flags:0x00004000 [ 4476.743994] Workqueue: events update_writeback_rate [bcache] [ 4476.744022] Call Trace: [ 4476.744024] <TASK> [ 4476.744026] __schedule+0x23d/0x5a0 [ 4476.744032] schedule+0x4e/0xb0 [ 4476.744035] rwsem_down_read_slowpath+0x32e/0x380 [ 4476.744039] down_read+0x43/0x90 [ 4476.744042] update_writeback_rate+0x134/0x190 [bcache] [ 4476.744057] process_one_work+0x22b/0x3d0 [ 4476.744063] worker_thread+0x53/0x410 [ 4476.744066] ? process_one_work+0x3d0/0x3d0 [ 4476.744070] kthread+0x12a/0x150 [ 4476.744074] ? set_kthread_struct+0x50/0x50 [ 4476.744077] ret_from_fork+0x22/0x30 [ 4476.744082] </TASK> [ 4598.626220] INFO: task bcache_writebac:1835 blocked for more than 245 seconds. [ 4598.978432] Not tainted 5.15.50-051550-generic #202206251445 [ 4599.279900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4599.657293] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4599.657298] Call Trace: [ 4599.657300] <TASK> [ 4599.657302] __schedule+0x23d/0x5a0 [ 4599.657325] schedule+0x4e/0xb0 [ 4599.657328] rwsem_down_write_slowpath+0x220/0x3d0 [ 4599.657333] down_write+0x43/0x50 [ 4599.657338] bch_writeback_thread+0x78/0x320 [bcache] [ 4599.657374] ? read_dirty_submit+0x70/0x70 [bcache] [ 4599.657390] kthread+0x12a/0x150 [ 4599.657393] ? set_kthread_struct+0x50/0x50 [ 4599.657396] ret_from_fork+0x22/0x30 [ 4599.657401] </TASK> [ 4599.657434] INFO: task kworker/2:63:14895 blocked for more than 124 seconds. [ 4600.000498] Not tainted 5.15.50-051550-generic #202206251445 [ 4600.301188] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4600.681984] task:kworker/2:63 state:D stack: 0 pid:14895 ppid: 2 flags:0x00004000 [ 4600.682005] Workqueue: bcache bch_data_insert_start [bcache] [ 4600.682031] Call Trace: [ 4600.682033] <TASK> [ 4600.682035] __schedule+0x23d/0x5a0 [ 4600.682040] schedule+0x4e/0xb0 [ 4600.682043] bch_bucket_alloc+0x21a/0x5b0 [bcache] [ 4600.682066] ? wait_woken+0x70/0x70 [ 4600.682071] __bch_bucket_alloc_set+0x52/0xf0 [bcache] [ 4600.682082] bch_alloc_sectors+0x1c9/0x4c0 [bcache] [ 4600.682098] ? krealloc+0x9d/0xd0 [ 4600.682102] ? __bch_keylist_realloc+0xb7/0x100 [bcache] [ 4600.682113] ? __bch_submit_bbio+0x97/0xb0 [bcache] [ 4600.682126] bch_data_insert_start+0x15e/0x3a0 [bcache] [ 4600.682139] ? closure_sub+0x94/0xb0 [bcache] [ 4600.682152] process_one_work+0x22b/0x3d0 [ 4600.682157] worker_thread+0x53/0x410 [ 4600.682161] ? process_one_work+0x3d0/0x3d0 [ 4600.682164] kthread+0x12a/0x150 [ 4600.682168] ? set_kthread_struct+0x50/0x50 [ 4600.682171] ret_from_fork+0x22/0x30 [ 4600.682176] </TASK> [ 4600.682240] INFO: task kworker/25:123:16464 blocked for more than 125 seconds. [ 4601.033116] Not tainted 5.15.50-051550-generic #202206251445 [ 4601.333405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4601.713829] task:kworker/25:123 state:D stack: 0 pid:16464 ppid: 2 flags:0x00004000 [ 4601.713835] Workqueue: xfs-sync/bcache0 xfs_log_worker [xfs] [ 4601.713975] Call Trace: [ 4601.713977] <TASK> [ 4601.713979] __schedule+0x23d/0x5a0 [ 4601.713986] schedule+0x4e/0xb0 [ 4601.713988] rwsem_down_read_slowpath+0x32e/0x380 [ 4601.713993] down_read+0x43/0x90 [ 4601.713996] cached_dev_write+0x7e/0x480 [bcache] [ 4601.714018] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4601.714031] __submit_bio+0x1a1/0x220 [ 4601.714035] ? ttwu_do_activate+0x72/0xf0 [ 4601.714039] __submit_bio_noacct+0x85/0x1f0 [ 4601.714042] ? mutex_lock+0x13/0x40 [ 4601.714045] submit_bio_noacct+0x4e/0x120 [ 4601.714047] submit_bio+0x4a/0x130 [ 4601.714050] xlog_write_iclog+0x254/0x300 [xfs] [ 4601.714142] xlog_sync+0x1ab/0x2c0 [xfs] [ 4601.714225] xlog_state_release_iclog+0x123/0x1d0 [xfs] [ 4601.714306] xfs_log_force+0x186/0x250 [xfs] [ 4601.714386] xfs_log_worker+0x39/0x90 [xfs] [ 4601.714466] process_one_work+0x22b/0x3d0 [ 4601.714471] worker_thread+0x53/0x410 [ 4601.714474] ? process_one_work+0x3d0/0x3d0 [ 4601.714477] kthread+0x12a/0x150 [ 4601.714481] ? set_kthread_struct+0x50/0x50 [ 4601.714484] ret_from_fork+0x22/0x30 [ 4601.714489] </TASK> [ 4601.714526] INFO: task kworker/22:89:16557 blocked for more than 126 seconds. [ 4602.056844] Not tainted 5.15.50-051550-generic #202206251445 [ 4602.357148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4602.733923] task:kworker/22:89 state:D stack: 0 pid:16557 ppid: 2 flags:0x00004000 [ 4602.733940] Workqueue: bcache bch_data_insert_start [bcache] [ 4602.733966] Call Trace: [ 4602.733968] <TASK> [ 4602.733970] __schedule+0x23d/0x5a0 [ 4602.733988] schedule+0x4e/0xb0 [ 4602.733991] bch_bucket_alloc+0x21a/0x5b0 [bcache] [ 4602.734002] ? bch_btree_insert_check_key+0x1e0/0x1e0 [bcache] [ 4602.734014] ? wait_woken+0x70/0x70 [ 4602.734019] __bch_bucket_alloc_set+0x52/0xf0 [bcache] [ 4602.734031] bch_alloc_sectors+0x1c9/0x4c0 [bcache] [ 4602.734043] ? __wake_up_common_lock+0x8a/0xc0 [ 4602.734046] bch_data_insert_start+0x15e/0x3a0 [bcache] [ 4602.734062] ? closure_sub+0x94/0xb0 [bcache] [ 4602.734083] process_one_work+0x22b/0x3d0 [ 4602.734088] worker_thread+0x53/0x410 [ 4602.734092] ? process_one_work+0x3d0/0x3d0 [ 4602.734095] kthread+0x12a/0x150 [ 4602.734099] ? set_kthread_struct+0x50/0x50 [ 4602.734102] ret_from_fork+0x22/0x30 [ 4602.734108] </TASK> [ 4602.734134] INFO: task fio:16626 blocked for more than 249 seconds. [ 4603.034541] Not tainted 5.15.50-051550-generic #202206251445 [ 4603.330400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4603.711013] task:fio state:D stack: 0 pid:16626 ppid: 1715 flags:0x00004002 [ 4603.711030] Call Trace: [ 4603.711032] <TASK> [ 4603.711046] __schedule+0x23d/0x5a0 [ 4603.711054] ? sysvec_apic_timer_interrupt+0x4e/0x90 [ 4603.711060] schedule+0x4e/0xb0 [ 4603.711063] rwsem_down_read_slowpath+0x32e/0x380 [ 4603.711067] down_read+0x43/0x90 [ 4603.711072] cached_dev_write+0x7e/0x480 [bcache] [ 4603.711095] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4603.711108] __submit_bio+0x1a1/0x220 [ 4603.711113] __submit_bio_noacct+0x85/0x1f0 [ 4603.711116] submit_bio_noacct+0x4e/0x120 [ 4603.711118] submit_bio+0x4a/0x130 [ 4603.711121] iomap_submit_ioend+0x53/0x80 [ 4603.711124] iomap_writepages+0x35/0x40 [ 4603.711126] xfs_vm_writepages+0x84/0xb0 [xfs] [ 4603.711223] do_writepages+0xda/0x200 [ 4603.711229] filemap_fdatawrite_wbc+0x81/0xd0 [ 4603.711232] file_write_and_wait_range+0xac/0xf0 [ 4603.711235] xfs_file_fsync+0x5b/0x250 [xfs] [ 4603.711312] vfs_fsync_range+0x49/0x80 [ 4603.711317] ? __fget_light+0x32/0x80 [ 4603.711321] __x64_sys_fsync+0x38/0x60 [ 4603.711324] do_syscall_64+0x5c/0xc0 [ 4603.711327] ? ksys_write+0xce/0xe0 [ 4603.711330] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4603.711335] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711338] ? __x64_sys_write+0x19/0x20 [ 4603.711340] ? do_syscall_64+0x69/0xc0 [ 4603.711342] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711345] ? __x64_sys_write+0x19/0x20 [ 4603.711347] ? do_syscall_64+0x69/0xc0 [ 4603.711349] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4603.711352] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711355] ? __x64_sys_write+0x19/0x20 [ 4603.711357] ? do_syscall_64+0x69/0xc0 [ 4603.711359] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711362] ? __x64_sys_write+0x19/0x20 [ 4603.711364] ? do_syscall_64+0x69/0xc0 [ 4603.711366] ? do_syscall_64+0x69/0xc0 [ 4603.711368] ? asm_common_interrupt+0x8/0x40 [ 4603.711371] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4603.711374] RIP: 0033:0x7fd3c498aa5b [ 4603.711376] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 4603.711379] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b [ 4603.711381] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006 [ 4603.711382] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40 [ 4603.711383] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000 [ 4603.711384] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000 [ 4603.711387] </TASK> [ 4603.711421] INFO: task kworker/26:87:16682 blocked for more than 250 seconds. [ 4604.058441] Not tainted 5.15.50-051550-generic #202206251445 [ 4604.359128] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4604.739950] task:kworker/26:87 state:D stack: 0 pid:16682 ppid: 2 flags:0x00004000 [ 4604.739968] Workqueue: events update_writeback_rate [bcache] [ 4604.739995] Call Trace: [ 4604.739997] <TASK> [ 4604.740012] __schedule+0x23d/0x5a0 [ 4604.740017] schedule+0x4e/0xb0 [ 4604.740020] rwsem_down_read_slowpath+0x32e/0x380 [ 4604.740024] down_read+0x43/0x90 [ 4604.740027] update_writeback_rate+0x134/0x190 [bcache] [ 4604.740043] process_one_work+0x22b/0x3d0 [ 4604.740049] worker_thread+0x53/0x410 [ 4604.740052] ? process_one_work+0x3d0/0x3d0 [ 4604.740055] kthread+0x12a/0x150 [ 4604.740059] ? set_kthread_struct+0x50/0x50 [ 4604.740062] ret_from_fork+0x22/0x30 [ 4604.740067] </TASK> [ 4725.600602] INFO: task bcache_writebac:1835 blocked for more than 372 seconds. [ 4725.952827] Not tainted 5.15.50-051550-generic #202206251445 [ 4726.253763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4726.630937] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4726.630954] Call Trace: [ 4726.630956] <TASK> [ 4726.630959] __schedule+0x23d/0x5a0 [ 4726.630966] schedule+0x4e/0xb0 [ 4726.630970] rwsem_down_write_slowpath+0x220/0x3d0 [ 4726.630976] down_write+0x43/0x50 [ 4726.630980] bch_writeback_thread+0x78/0x320 [bcache] [ 4726.631005] ? read_dirty_submit+0x70/0x70 [bcache] [ 4726.631021] kthread+0x12a/0x150 [ 4726.631025] ? set_kthread_struct+0x50/0x50 [ 4726.631028] ret_from_fork+0x22/0x30 [ 4726.631033] </TASK> [Fix] commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800 bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() I have built this fix into kernel 5.15.50 (https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15/ ) and tested to verify the problem is fixed. root@nikhilbionic:/home/ubuntu/nikhil/5_15_50/cod/mainline/v5.15.50# git log commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800 bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() The local variables check_state (in bch_btree_check()) and state (in bch_sectors_dirty_init()) should be fully filled by 0, because before allocating them on stack, they were dynamically allocated by kzalloc(). Signed-off-by: Coly Li <colyli@suse.de> Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk> commit 18a33c8dabb88b50b860e0177a73933f2c0ddf68 (tag: v5.15.50) Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Date: Sat Jun 25 15:18:40 2022 +0200 Linux 5.15.50 Link: https://lore.kernel.org/r/20220623164322.288837280@linuxfoundation.org Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [Regression Potential] I have not seen any potential drawbacks or harmful effects of this fix in my testing. In fact it is required, without which the deadlock is easily reproduced both on focal as well as jammy GA. SRU Justification: [Impact] This bug in bcache affects (at least) focal and jammy releases. (not tested bionic as yet) When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic, and is fixed through upstream commit commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800     bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()     The local variables check_state (in bch_btree_check()) and state (in     bch_sectors_dirty_init()) should be fully filled by 0, because before     allocating them on stack, they were dynamically allocated by kzalloc().     Signed-off-by: Coly Li <colyli@suse.de>     Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de     Signed-off-by: Jens Axboe <axboe@kernel.dk> The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> [ 4474.731735] INFO: task fio:16626 blocked for more than 121 seconds. [ 4475.035858] Not tainted 5.15.50-051550-generic #202206251445 [ 4475.335859] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4475.716526] task:fio state:D stack: 0 pid:16626 ppid: 1715 flags:0x00004002 [ 4475.716531] Call Trace: [ 4475.716546] <TASK> [ 4475.716549] __schedule+0x23d/0x5a0 [ 4475.716555] ? sysvec_apic_timer_interrupt+0x4e/0x90 [ 4475.716560] schedule+0x4e/0xb0 [ 4475.716563] rwsem_down_read_slowpath+0x32e/0x380 [ 4475.716567] down_read+0x43/0x90 [ 4475.716571] cached_dev_write+0x7e/0x480 [bcache] [ 4475.716604] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4475.716618] __submit_bio+0x1a1/0x220 [ 4475.716623] __submit_bio_noacct+0x85/0x1f0 [ 4475.716626] submit_bio_noacct+0x4e/0x120 [ 4475.716628] submit_bio+0x4a/0x130 [ 4475.716631] iomap_submit_ioend+0x53/0x80 [ 4475.716634] iomap_writepages+0x35/0x40 [ 4475.716636] xfs_vm_writepages+0x84/0xb0 [xfs] [ 4475.716730] do_writepages+0xda/0x200 [ 4475.716736] filemap_fdatawrite_wbc+0x81/0xd0 [ 4475.716739] file_write_and_wait_range+0xac/0xf0 [ 4475.716742] xfs_file_fsync+0x5b/0x250 [xfs] [ 4475.716821] vfs_fsync_range+0x49/0x80 [ 4475.716826] ? __fget_light+0x32/0x80 [ 4475.716829] __x64_sys_fsync+0x38/0x60 [ 4475.716832] do_syscall_64+0x5c/0xc0 [ 4475.716836] ? ksys_write+0xce/0xe0 [ 4475.716838] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4475.716843] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716847] ? __x64_sys_write+0x19/0x20 [ 4475.716849] ? do_syscall_64+0x69/0xc0 [ 4475.716851] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716855] ? __x64_sys_write+0x19/0x20 [ 4475.716857] ? do_syscall_64+0x69/0xc0 [ 4475.716859] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4475.716862] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716865] ? __x64_sys_write+0x19/0x20 [ 4475.716867] ? do_syscall_64+0x69/0xc0 [ 4475.716869] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716873] ? __x64_sys_write+0x19/0x20 [ 4475.716875] ? do_syscall_64+0x69/0xc0 [ 4475.716877] ? do_syscall_64+0x69/0xc0 [ 4475.716879] ? asm_common_interrupt+0x8/0x40 [ 4475.716883] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4475.716885] RIP: 0033:0x7fd3c498aa5b [ 4475.716888] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 4475.716890] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b [ 4475.716892] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006 [ 4475.716893] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40 [ 4475.716895] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000 [ 4475.716896] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000 [ 4475.716898] </TASK> [ 4475.716912] INFO: task kworker/26:87:16682 blocked for more than 122 seconds. [ 4476.063391] Not tainted 5.15.50-051550-generic #202206251445 [ 4476.363476] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4476.743988] task:kworker/26:87 state:D stack: 0 pid:16682 ppid: 2 flags:0x00004000 [ 4476.743994] Workqueue: events update_writeback_rate [bcache] [ 4476.744022] Call Trace: [ 4476.744024] <TASK> [ 4476.744026] __schedule+0x23d/0x5a0 [ 4476.744032] schedule+0x4e/0xb0 [ 4476.744035] rwsem_down_read_slowpath+0x32e/0x380 [ 4476.744039] down_read+0x43/0x90 [ 4476.744042] update_writeback_rate+0x134/0x190 [bcache] [ 4476.744057] process_one_work+0x22b/0x3d0 [ 4476.744063] worker_thread+0x53/0x410 [ 4476.744066] ? process_one_work+0x3d0/0x3d0 [ 4476.744070] kthread+0x12a/0x150 [ 4476.744074] ? set_kthread_struct+0x50/0x50 [ 4476.744077] ret_from_fork+0x22/0x30 [ 4476.744082] </TASK> [ 4598.626220] INFO: task bcache_writebac:1835 blocked for more than 245 seconds. [ 4598.978432] Not tainted 5.15.50-051550-generic #202206251445 [ 4599.279900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4599.657293] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4599.657298] Call Trace: [ 4599.657300] <TASK> [ 4599.657302] __schedule+0x23d/0x5a0 [ 4599.657325] schedule+0x4e/0xb0 [ 4599.657328] rwsem_down_write_slowpath+0x220/0x3d0 [ 4599.657333] down_write+0x43/0x50 [ 4599.657338] bch_writeback_thread+0x78/0x320 [bcache] [ 4599.657374] ? read_dirty_submit+0x70/0x70 [bcache] [ 4599.657390] kthread+0x12a/0x150 [ 4599.657393] ? set_kthread_struct+0x50/0x50 [ 4599.657396] ret_from_fork+0x22/0x30 [ 4599.657401] </TASK> [ 4599.657434] INFO: task kworker/2:63:14895 blocked for more than 124 seconds. [ 4600.000498] Not tainted 5.15.50-051550-generic #202206251445 [ 4600.301188] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4600.681984] task:kworker/2:63 state:D stack: 0 pid:14895 ppid: 2 flags:0x00004000 [ 4600.682005] Workqueue: bcache bch_data_insert_start [bcache] [ 4600.682031] Call Trace: [ 4600.682033] <TASK> [ 4600.682035] __schedule+0x23d/0x5a0 [ 4600.682040] schedule+0x4e/0xb0 [ 4600.682043] bch_bucket_alloc+0x21a/0x5b0 [bcache] [ 4600.682066] ? wait_woken+0x70/0x70 [ 4600.682071] __bch_bucket_alloc_set+0x52/0xf0 [bcache] [ 4600.682082] bch_alloc_sectors+0x1c9/0x4c0 [bcache] [ 4600.682098] ? krealloc+0x9d/0xd0 [ 4600.682102] ? __bch_keylist_realloc+0xb7/0x100 [bcache] [ 4600.682113] ? __bch_submit_bbio+0x97/0xb0 [bcache] [ 4600.682126] bch_data_insert_start+0x15e/0x3a0 [bcache] [ 4600.682139] ? closure_sub+0x94/0xb0 [bcache] [ 4600.682152] process_one_work+0x22b/0x3d0 [ 4600.682157] worker_thread+0x53/0x410 [ 4600.682161] ? process_one_work+0x3d0/0x3d0 [ 4600.682164] kthread+0x12a/0x150 [ 4600.682168] ? set_kthread_struct+0x50/0x50 [ 4600.682171] ret_from_fork+0x22/0x30 [ 4600.682176] </TASK> [ 4600.682240] INFO: task kworker/25:123:16464 blocked for more than 125 seconds. [ 4601.033116] Not tainted 5.15.50-051550-generic #202206251445 [ 4601.333405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4601.713829] task:kworker/25:123 state:D stack: 0 pid:16464 ppid: 2 flags:0x00004000 [ 4601.713835] Workqueue: xfs-sync/bcache0 xfs_log_worker [xfs] [ 4601.713975] Call Trace: [ 4601.713977] <TASK> [ 4601.713979] __schedule+0x23d/0x5a0 [ 4601.713986] schedule+0x4e/0xb0 [ 4601.713988] rwsem_down_read_slowpath+0x32e/0x380 [ 4601.713993] down_read+0x43/0x90 [ 4601.713996] cached_dev_write+0x7e/0x480 [bcache] [ 4601.714018] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4601.714031] __submit_bio+0x1a1/0x220 [ 4601.714035] ? ttwu_do_activate+0x72/0xf0 [ 4601.714039] __submit_bio_noacct+0x85/0x1f0 [ 4601.714042] ? mutex_lock+0x13/0x40 [ 4601.714045] submit_bio_noacct+0x4e/0x120 [ 4601.714047] submit_bio+0x4a/0x130 [ 4601.714050] xlog_write_iclog+0x254/0x300 [xfs] [ 4601.714142] xlog_sync+0x1ab/0x2c0 [xfs] [ 4601.714225] xlog_state_release_iclog+0x123/0x1d0 [xfs] [ 4601.714306] xfs_log_force+0x186/0x250 [xfs] [ 4601.714386] xfs_log_worker+0x39/0x90 [xfs] [ 4601.714466] process_one_work+0x22b/0x3d0 [ 4601.714471] worker_thread+0x53/0x410 [ 4601.714474] ? process_one_work+0x3d0/0x3d0 [ 4601.714477] kthread+0x12a/0x150 [ 4601.714481] ? set_kthread_struct+0x50/0x50 [ 4601.714484] ret_from_fork+0x22/0x30 [ 4601.714489] </TASK> [ 4601.714526] INFO: task kworker/22:89:16557 blocked for more than 126 seconds. [ 4602.056844] Not tainted 5.15.50-051550-generic #202206251445 [ 4602.357148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4602.733923] task:kworker/22:89 state:D stack: 0 pid:16557 ppid: 2 flags:0x00004000 [ 4602.733940] Workqueue: bcache bch_data_insert_start [bcache] [ 4602.733966] Call Trace: [ 4602.733968] <TASK> [ 4602.733970] __schedule+0x23d/0x5a0 [ 4602.733988] schedule+0x4e/0xb0 [ 4602.733991] bch_bucket_alloc+0x21a/0x5b0 [bcache] [ 4602.734002] ? bch_btree_insert_check_key+0x1e0/0x1e0 [bcache] [ 4602.734014] ? wait_woken+0x70/0x70 [ 4602.734019] __bch_bucket_alloc_set+0x52/0xf0 [bcache] [ 4602.734031] bch_alloc_sectors+0x1c9/0x4c0 [bcache] [ 4602.734043] ? __wake_up_common_lock+0x8a/0xc0 [ 4602.734046] bch_data_insert_start+0x15e/0x3a0 [bcache] [ 4602.734062] ? closure_sub+0x94/0xb0 [bcache] [ 4602.734083] process_one_work+0x22b/0x3d0 [ 4602.734088] worker_thread+0x53/0x410 [ 4602.734092] ? process_one_work+0x3d0/0x3d0 [ 4602.734095] kthread+0x12a/0x150 [ 4602.734099] ? set_kthread_struct+0x50/0x50 [ 4602.734102] ret_from_fork+0x22/0x30 [ 4602.734108] </TASK> [ 4602.734134] INFO: task fio:16626 blocked for more than 249 seconds. [ 4603.034541] Not tainted 5.15.50-051550-generic #202206251445 [ 4603.330400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4603.711013] task:fio state:D stack: 0 pid:16626 ppid: 1715 flags:0x00004002 [ 4603.711030] Call Trace: [ 4603.711032] <TASK> [ 4603.711046] __schedule+0x23d/0x5a0 [ 4603.711054] ? sysvec_apic_timer_interrupt+0x4e/0x90 [ 4603.711060] schedule+0x4e/0xb0 [ 4603.711063] rwsem_down_read_slowpath+0x32e/0x380 [ 4603.711067] down_read+0x43/0x90 [ 4603.711072] cached_dev_write+0x7e/0x480 [bcache] [ 4603.711095] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4603.711108] __submit_bio+0x1a1/0x220 [ 4603.711113] __submit_bio_noacct+0x85/0x1f0 [ 4603.711116] submit_bio_noacct+0x4e/0x120 [ 4603.711118] submit_bio+0x4a/0x130 [ 4603.711121] iomap_submit_ioend+0x53/0x80 [ 4603.711124] iomap_writepages+0x35/0x40 [ 4603.711126] xfs_vm_writepages+0x84/0xb0 [xfs] [ 4603.711223] do_writepages+0xda/0x200 [ 4603.711229] filemap_fdatawrite_wbc+0x81/0xd0 [ 4603.711232] file_write_and_wait_range+0xac/0xf0 [ 4603.711235] xfs_file_fsync+0x5b/0x250 [xfs] [ 4603.711312] vfs_fsync_range+0x49/0x80 [ 4603.711317] ? __fget_light+0x32/0x80 [ 4603.711321] __x64_sys_fsync+0x38/0x60 [ 4603.711324] do_syscall_64+0x5c/0xc0 [ 4603.711327] ? ksys_write+0xce/0xe0 [ 4603.711330] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4603.711335] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711338] ? __x64_sys_write+0x19/0x20 [ 4603.711340] ? do_syscall_64+0x69/0xc0 [ 4603.711342] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711345] ? __x64_sys_write+0x19/0x20 [ 4603.711347] ? do_syscall_64+0x69/0xc0 [ 4603.711349] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4603.711352] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711355] ? __x64_sys_write+0x19/0x20 [ 4603.711357] ? do_syscall_64+0x69/0xc0 [ 4603.711359] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711362] ? __x64_sys_write+0x19/0x20 [ 4603.711364] ? do_syscall_64+0x69/0xc0 [ 4603.711366] ? do_syscall_64+0x69/0xc0 [ 4603.711368] ? asm_common_interrupt+0x8/0x40 [ 4603.711371] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4603.711374] RIP: 0033:0x7fd3c498aa5b [ 4603.711376] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 4603.711379] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b [ 4603.711381] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006 [ 4603.711382] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40 [ 4603.711383] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000 [ 4603.711384] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000 [ 4603.711387] </TASK> [ 4603.711421] INFO: task kworker/26:87:16682 blocked for more than 250 seconds. [ 4604.058441] Not tainted 5.15.50-051550-generic #202206251445 [ 4604.359128] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4604.739950] task:kworker/26:87 state:D stack: 0 pid:16682 ppid: 2 flags:0x00004000 [ 4604.739968] Workqueue: events update_writeback_rate [bcache] [ 4604.739995] Call Trace: [ 4604.739997] <TASK> [ 4604.740012] __schedule+0x23d/0x5a0 [ 4604.740017] schedule+0x4e/0xb0 [ 4604.740020] rwsem_down_read_slowpath+0x32e/0x380 [ 4604.740024] down_read+0x43/0x90 [ 4604.740027] update_writeback_rate+0x134/0x190 [bcache] [ 4604.740043] process_one_work+0x22b/0x3d0 [ 4604.740049] worker_thread+0x53/0x410 [ 4604.740052] ? process_one_work+0x3d0/0x3d0 [ 4604.740055] kthread+0x12a/0x150 [ 4604.740059] ? set_kthread_struct+0x50/0x50 [ 4604.740062] ret_from_fork+0x22/0x30 [ 4604.740067] </TASK> [ 4725.600602] INFO: task bcache_writebac:1835 blocked for more than 372 seconds. [ 4725.952827] Not tainted 5.15.50-051550-generic #202206251445 [ 4726.253763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4726.630937] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4726.630954] Call Trace: [ 4726.630956] <TASK> [ 4726.630959] __schedule+0x23d/0x5a0 [ 4726.630966] schedule+0x4e/0xb0 [ 4726.630970] rwsem_down_write_slowpath+0x220/0x3d0 [ 4726.630976] down_write+0x43/0x50 [ 4726.630980] bch_writeback_thread+0x78/0x320 [bcache] [ 4726.631005] ? read_dirty_submit+0x70/0x70 [bcache] [ 4726.631021] kthread+0x12a/0x150 [ 4726.631025] ? set_kthread_struct+0x50/0x50 [ 4726.631028] ret_from_fork+0x22/0x30 [ 4726.631033] </TASK> [Fix] commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800     bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() I have built this fix into kernel 5.15.50 (https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15/ ) and tested to verify the problem is fixed. root@nikhilbionic:/home/ubuntu/nikhil/5_15_50/cod/mainline/v5.15.50# git log commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800     bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()     The local variables check_state (in bch_btree_check()) and state (in     bch_sectors_dirty_init()) should be fully filled by 0, because before     allocating them on stack, they were dynamically allocated by kzalloc().     Signed-off-by: Coly Li <colyli@suse.de>     Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de     Signed-off-by: Jens Axboe <axboe@kernel.dk> commit 18a33c8dabb88b50b860e0177a73933f2c0ddf68 (tag: v5.15.50) Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Date: Sat Jun 25 15:18:40 2022 +0200     Linux 5.15.50     Link: https://lore.kernel.org/r/20220623164322.288837280@linuxfoundation.org     Tested-by: Florian Fainelli <f.fainelli@gmail.com>     Tested-by: Shuah Khan <skhan@linuxfoundation.org>     Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>     Tested-by: Jon Hunter <jonathanh@nvidia.com>     Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>     Tested-by: Ron Economos <re@w6rz.net>     Tested-by: Guenter Roeck <linux@roeck-us.net>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [Regression Potential] I have not seen any potential drawbacks or harmful effects of this fix in my testing. In fact it is required, without which the deadlock is easily reproduced both on focal as well as jammy GA.
2022-07-27 07:58:35 nikhil kshirsagar description SRU Justification: [Impact] This bug in bcache affects (at least) focal and jammy releases. (not tested bionic as yet) When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic, and is fixed through upstream commit commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800     bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()     The local variables check_state (in bch_btree_check()) and state (in     bch_sectors_dirty_init()) should be fully filled by 0, because before     allocating them on stack, they were dynamically allocated by kzalloc().     Signed-off-by: Coly Li <colyli@suse.de>     Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de     Signed-off-by: Jens Axboe <axboe@kernel.dk> The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> [ 4474.731735] INFO: task fio:16626 blocked for more than 121 seconds. [ 4475.035858] Not tainted 5.15.50-051550-generic #202206251445 [ 4475.335859] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4475.716526] task:fio state:D stack: 0 pid:16626 ppid: 1715 flags:0x00004002 [ 4475.716531] Call Trace: [ 4475.716546] <TASK> [ 4475.716549] __schedule+0x23d/0x5a0 [ 4475.716555] ? sysvec_apic_timer_interrupt+0x4e/0x90 [ 4475.716560] schedule+0x4e/0xb0 [ 4475.716563] rwsem_down_read_slowpath+0x32e/0x380 [ 4475.716567] down_read+0x43/0x90 [ 4475.716571] cached_dev_write+0x7e/0x480 [bcache] [ 4475.716604] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4475.716618] __submit_bio+0x1a1/0x220 [ 4475.716623] __submit_bio_noacct+0x85/0x1f0 [ 4475.716626] submit_bio_noacct+0x4e/0x120 [ 4475.716628] submit_bio+0x4a/0x130 [ 4475.716631] iomap_submit_ioend+0x53/0x80 [ 4475.716634] iomap_writepages+0x35/0x40 [ 4475.716636] xfs_vm_writepages+0x84/0xb0 [xfs] [ 4475.716730] do_writepages+0xda/0x200 [ 4475.716736] filemap_fdatawrite_wbc+0x81/0xd0 [ 4475.716739] file_write_and_wait_range+0xac/0xf0 [ 4475.716742] xfs_file_fsync+0x5b/0x250 [xfs] [ 4475.716821] vfs_fsync_range+0x49/0x80 [ 4475.716826] ? __fget_light+0x32/0x80 [ 4475.716829] __x64_sys_fsync+0x38/0x60 [ 4475.716832] do_syscall_64+0x5c/0xc0 [ 4475.716836] ? ksys_write+0xce/0xe0 [ 4475.716838] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4475.716843] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716847] ? __x64_sys_write+0x19/0x20 [ 4475.716849] ? do_syscall_64+0x69/0xc0 [ 4475.716851] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716855] ? __x64_sys_write+0x19/0x20 [ 4475.716857] ? do_syscall_64+0x69/0xc0 [ 4475.716859] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4475.716862] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716865] ? __x64_sys_write+0x19/0x20 [ 4475.716867] ? do_syscall_64+0x69/0xc0 [ 4475.716869] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716873] ? __x64_sys_write+0x19/0x20 [ 4475.716875] ? do_syscall_64+0x69/0xc0 [ 4475.716877] ? do_syscall_64+0x69/0xc0 [ 4475.716879] ? asm_common_interrupt+0x8/0x40 [ 4475.716883] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4475.716885] RIP: 0033:0x7fd3c498aa5b [ 4475.716888] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 4475.716890] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b [ 4475.716892] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006 [ 4475.716893] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40 [ 4475.716895] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000 [ 4475.716896] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000 [ 4475.716898] </TASK> [ 4475.716912] INFO: task kworker/26:87:16682 blocked for more than 122 seconds. [ 4476.063391] Not tainted 5.15.50-051550-generic #202206251445 [ 4476.363476] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4476.743988] task:kworker/26:87 state:D stack: 0 pid:16682 ppid: 2 flags:0x00004000 [ 4476.743994] Workqueue: events update_writeback_rate [bcache] [ 4476.744022] Call Trace: [ 4476.744024] <TASK> [ 4476.744026] __schedule+0x23d/0x5a0 [ 4476.744032] schedule+0x4e/0xb0 [ 4476.744035] rwsem_down_read_slowpath+0x32e/0x380 [ 4476.744039] down_read+0x43/0x90 [ 4476.744042] update_writeback_rate+0x134/0x190 [bcache] [ 4476.744057] process_one_work+0x22b/0x3d0 [ 4476.744063] worker_thread+0x53/0x410 [ 4476.744066] ? process_one_work+0x3d0/0x3d0 [ 4476.744070] kthread+0x12a/0x150 [ 4476.744074] ? set_kthread_struct+0x50/0x50 [ 4476.744077] ret_from_fork+0x22/0x30 [ 4476.744082] </TASK> [ 4598.626220] INFO: task bcache_writebac:1835 blocked for more than 245 seconds. [ 4598.978432] Not tainted 5.15.50-051550-generic #202206251445 [ 4599.279900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4599.657293] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4599.657298] Call Trace: [ 4599.657300] <TASK> [ 4599.657302] __schedule+0x23d/0x5a0 [ 4599.657325] schedule+0x4e/0xb0 [ 4599.657328] rwsem_down_write_slowpath+0x220/0x3d0 [ 4599.657333] down_write+0x43/0x50 [ 4599.657338] bch_writeback_thread+0x78/0x320 [bcache] [ 4599.657374] ? read_dirty_submit+0x70/0x70 [bcache] [ 4599.657390] kthread+0x12a/0x150 [ 4599.657393] ? set_kthread_struct+0x50/0x50 [ 4599.657396] ret_from_fork+0x22/0x30 [ 4599.657401] </TASK> [ 4599.657434] INFO: task kworker/2:63:14895 blocked for more than 124 seconds. [ 4600.000498] Not tainted 5.15.50-051550-generic #202206251445 [ 4600.301188] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4600.681984] task:kworker/2:63 state:D stack: 0 pid:14895 ppid: 2 flags:0x00004000 [ 4600.682005] Workqueue: bcache bch_data_insert_start [bcache] [ 4600.682031] Call Trace: [ 4600.682033] <TASK> [ 4600.682035] __schedule+0x23d/0x5a0 [ 4600.682040] schedule+0x4e/0xb0 [ 4600.682043] bch_bucket_alloc+0x21a/0x5b0 [bcache] [ 4600.682066] ? wait_woken+0x70/0x70 [ 4600.682071] __bch_bucket_alloc_set+0x52/0xf0 [bcache] [ 4600.682082] bch_alloc_sectors+0x1c9/0x4c0 [bcache] [ 4600.682098] ? krealloc+0x9d/0xd0 [ 4600.682102] ? __bch_keylist_realloc+0xb7/0x100 [bcache] [ 4600.682113] ? __bch_submit_bbio+0x97/0xb0 [bcache] [ 4600.682126] bch_data_insert_start+0x15e/0x3a0 [bcache] [ 4600.682139] ? closure_sub+0x94/0xb0 [bcache] [ 4600.682152] process_one_work+0x22b/0x3d0 [ 4600.682157] worker_thread+0x53/0x410 [ 4600.682161] ? process_one_work+0x3d0/0x3d0 [ 4600.682164] kthread+0x12a/0x150 [ 4600.682168] ? set_kthread_struct+0x50/0x50 [ 4600.682171] ret_from_fork+0x22/0x30 [ 4600.682176] </TASK> [ 4600.682240] INFO: task kworker/25:123:16464 blocked for more than 125 seconds. [ 4601.033116] Not tainted 5.15.50-051550-generic #202206251445 [ 4601.333405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4601.713829] task:kworker/25:123 state:D stack: 0 pid:16464 ppid: 2 flags:0x00004000 [ 4601.713835] Workqueue: xfs-sync/bcache0 xfs_log_worker [xfs] [ 4601.713975] Call Trace: [ 4601.713977] <TASK> [ 4601.713979] __schedule+0x23d/0x5a0 [ 4601.713986] schedule+0x4e/0xb0 [ 4601.713988] rwsem_down_read_slowpath+0x32e/0x380 [ 4601.713993] down_read+0x43/0x90 [ 4601.713996] cached_dev_write+0x7e/0x480 [bcache] [ 4601.714018] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4601.714031] __submit_bio+0x1a1/0x220 [ 4601.714035] ? ttwu_do_activate+0x72/0xf0 [ 4601.714039] __submit_bio_noacct+0x85/0x1f0 [ 4601.714042] ? mutex_lock+0x13/0x40 [ 4601.714045] submit_bio_noacct+0x4e/0x120 [ 4601.714047] submit_bio+0x4a/0x130 [ 4601.714050] xlog_write_iclog+0x254/0x300 [xfs] [ 4601.714142] xlog_sync+0x1ab/0x2c0 [xfs] [ 4601.714225] xlog_state_release_iclog+0x123/0x1d0 [xfs] [ 4601.714306] xfs_log_force+0x186/0x250 [xfs] [ 4601.714386] xfs_log_worker+0x39/0x90 [xfs] [ 4601.714466] process_one_work+0x22b/0x3d0 [ 4601.714471] worker_thread+0x53/0x410 [ 4601.714474] ? process_one_work+0x3d0/0x3d0 [ 4601.714477] kthread+0x12a/0x150 [ 4601.714481] ? set_kthread_struct+0x50/0x50 [ 4601.714484] ret_from_fork+0x22/0x30 [ 4601.714489] </TASK> [ 4601.714526] INFO: task kworker/22:89:16557 blocked for more than 126 seconds. [ 4602.056844] Not tainted 5.15.50-051550-generic #202206251445 [ 4602.357148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4602.733923] task:kworker/22:89 state:D stack: 0 pid:16557 ppid: 2 flags:0x00004000 [ 4602.733940] Workqueue: bcache bch_data_insert_start [bcache] [ 4602.733966] Call Trace: [ 4602.733968] <TASK> [ 4602.733970] __schedule+0x23d/0x5a0 [ 4602.733988] schedule+0x4e/0xb0 [ 4602.733991] bch_bucket_alloc+0x21a/0x5b0 [bcache] [ 4602.734002] ? bch_btree_insert_check_key+0x1e0/0x1e0 [bcache] [ 4602.734014] ? wait_woken+0x70/0x70 [ 4602.734019] __bch_bucket_alloc_set+0x52/0xf0 [bcache] [ 4602.734031] bch_alloc_sectors+0x1c9/0x4c0 [bcache] [ 4602.734043] ? __wake_up_common_lock+0x8a/0xc0 [ 4602.734046] bch_data_insert_start+0x15e/0x3a0 [bcache] [ 4602.734062] ? closure_sub+0x94/0xb0 [bcache] [ 4602.734083] process_one_work+0x22b/0x3d0 [ 4602.734088] worker_thread+0x53/0x410 [ 4602.734092] ? process_one_work+0x3d0/0x3d0 [ 4602.734095] kthread+0x12a/0x150 [ 4602.734099] ? set_kthread_struct+0x50/0x50 [ 4602.734102] ret_from_fork+0x22/0x30 [ 4602.734108] </TASK> [ 4602.734134] INFO: task fio:16626 blocked for more than 249 seconds. [ 4603.034541] Not tainted 5.15.50-051550-generic #202206251445 [ 4603.330400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4603.711013] task:fio state:D stack: 0 pid:16626 ppid: 1715 flags:0x00004002 [ 4603.711030] Call Trace: [ 4603.711032] <TASK> [ 4603.711046] __schedule+0x23d/0x5a0 [ 4603.711054] ? sysvec_apic_timer_interrupt+0x4e/0x90 [ 4603.711060] schedule+0x4e/0xb0 [ 4603.711063] rwsem_down_read_slowpath+0x32e/0x380 [ 4603.711067] down_read+0x43/0x90 [ 4603.711072] cached_dev_write+0x7e/0x480 [bcache] [ 4603.711095] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4603.711108] __submit_bio+0x1a1/0x220 [ 4603.711113] __submit_bio_noacct+0x85/0x1f0 [ 4603.711116] submit_bio_noacct+0x4e/0x120 [ 4603.711118] submit_bio+0x4a/0x130 [ 4603.711121] iomap_submit_ioend+0x53/0x80 [ 4603.711124] iomap_writepages+0x35/0x40 [ 4603.711126] xfs_vm_writepages+0x84/0xb0 [xfs] [ 4603.711223] do_writepages+0xda/0x200 [ 4603.711229] filemap_fdatawrite_wbc+0x81/0xd0 [ 4603.711232] file_write_and_wait_range+0xac/0xf0 [ 4603.711235] xfs_file_fsync+0x5b/0x250 [xfs] [ 4603.711312] vfs_fsync_range+0x49/0x80 [ 4603.711317] ? __fget_light+0x32/0x80 [ 4603.711321] __x64_sys_fsync+0x38/0x60 [ 4603.711324] do_syscall_64+0x5c/0xc0 [ 4603.711327] ? ksys_write+0xce/0xe0 [ 4603.711330] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4603.711335] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711338] ? __x64_sys_write+0x19/0x20 [ 4603.711340] ? do_syscall_64+0x69/0xc0 [ 4603.711342] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711345] ? __x64_sys_write+0x19/0x20 [ 4603.711347] ? do_syscall_64+0x69/0xc0 [ 4603.711349] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4603.711352] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711355] ? __x64_sys_write+0x19/0x20 [ 4603.711357] ? do_syscall_64+0x69/0xc0 [ 4603.711359] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711362] ? __x64_sys_write+0x19/0x20 [ 4603.711364] ? do_syscall_64+0x69/0xc0 [ 4603.711366] ? do_syscall_64+0x69/0xc0 [ 4603.711368] ? asm_common_interrupt+0x8/0x40 [ 4603.711371] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4603.711374] RIP: 0033:0x7fd3c498aa5b [ 4603.711376] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 4603.711379] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b [ 4603.711381] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006 [ 4603.711382] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40 [ 4603.711383] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000 [ 4603.711384] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000 [ 4603.711387] </TASK> [ 4603.711421] INFO: task kworker/26:87:16682 blocked for more than 250 seconds. [ 4604.058441] Not tainted 5.15.50-051550-generic #202206251445 [ 4604.359128] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4604.739950] task:kworker/26:87 state:D stack: 0 pid:16682 ppid: 2 flags:0x00004000 [ 4604.739968] Workqueue: events update_writeback_rate [bcache] [ 4604.739995] Call Trace: [ 4604.739997] <TASK> [ 4604.740012] __schedule+0x23d/0x5a0 [ 4604.740017] schedule+0x4e/0xb0 [ 4604.740020] rwsem_down_read_slowpath+0x32e/0x380 [ 4604.740024] down_read+0x43/0x90 [ 4604.740027] update_writeback_rate+0x134/0x190 [bcache] [ 4604.740043] process_one_work+0x22b/0x3d0 [ 4604.740049] worker_thread+0x53/0x410 [ 4604.740052] ? process_one_work+0x3d0/0x3d0 [ 4604.740055] kthread+0x12a/0x150 [ 4604.740059] ? set_kthread_struct+0x50/0x50 [ 4604.740062] ret_from_fork+0x22/0x30 [ 4604.740067] </TASK> [ 4725.600602] INFO: task bcache_writebac:1835 blocked for more than 372 seconds. [ 4725.952827] Not tainted 5.15.50-051550-generic #202206251445 [ 4726.253763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4726.630937] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4726.630954] Call Trace: [ 4726.630956] <TASK> [ 4726.630959] __schedule+0x23d/0x5a0 [ 4726.630966] schedule+0x4e/0xb0 [ 4726.630970] rwsem_down_write_slowpath+0x220/0x3d0 [ 4726.630976] down_write+0x43/0x50 [ 4726.630980] bch_writeback_thread+0x78/0x320 [bcache] [ 4726.631005] ? read_dirty_submit+0x70/0x70 [bcache] [ 4726.631021] kthread+0x12a/0x150 [ 4726.631025] ? set_kthread_struct+0x50/0x50 [ 4726.631028] ret_from_fork+0x22/0x30 [ 4726.631033] </TASK> [Fix] commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800     bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() I have built this fix into kernel 5.15.50 (https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15/ ) and tested to verify the problem is fixed. root@nikhilbionic:/home/ubuntu/nikhil/5_15_50/cod/mainline/v5.15.50# git log commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800     bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()     The local variables check_state (in bch_btree_check()) and state (in     bch_sectors_dirty_init()) should be fully filled by 0, because before     allocating them on stack, they were dynamically allocated by kzalloc().     Signed-off-by: Coly Li <colyli@suse.de>     Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de     Signed-off-by: Jens Axboe <axboe@kernel.dk> commit 18a33c8dabb88b50b860e0177a73933f2c0ddf68 (tag: v5.15.50) Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Date: Sat Jun 25 15:18:40 2022 +0200     Linux 5.15.50     Link: https://lore.kernel.org/r/20220623164322.288837280@linuxfoundation.org     Tested-by: Florian Fainelli <f.fainelli@gmail.com>     Tested-by: Shuah Khan <skhan@linuxfoundation.org>     Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>     Tested-by: Jon Hunter <jonathanh@nvidia.com>     Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>     Tested-by: Ron Economos <re@w6rz.net>     Tested-by: Guenter Roeck <linux@roeck-us.net>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [Regression Potential] I have not seen any potential drawbacks or harmful effects of this fix in my testing. In fact it is required, without which the deadlock is easily reproduced both on focal as well as jammy GA. SRU Justification: [Impact] This bug in bcache affects (at least) focal and jammy releases. (not tested bionic as yet) When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic, and is fixed through upstream commit 7d6b902ea0e02b2a25c480edf471cbaa4ebe6b3c The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> [ 4474.731735] INFO: task fio:16626 blocked for more than 121 seconds. [ 4475.035858] Not tainted 5.15.50-051550-generic #202206251445 [ 4475.335859] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4475.716526] task:fio state:D stack: 0 pid:16626 ppid: 1715 flags:0x00004002 [ 4475.716531] Call Trace: [ 4475.716546] <TASK> [ 4475.716549] __schedule+0x23d/0x5a0 [ 4475.716555] ? sysvec_apic_timer_interrupt+0x4e/0x90 [ 4475.716560] schedule+0x4e/0xb0 [ 4475.716563] rwsem_down_read_slowpath+0x32e/0x380 [ 4475.716567] down_read+0x43/0x90 [ 4475.716571] cached_dev_write+0x7e/0x480 [bcache] [ 4475.716604] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4475.716618] __submit_bio+0x1a1/0x220 [ 4475.716623] __submit_bio_noacct+0x85/0x1f0 [ 4475.716626] submit_bio_noacct+0x4e/0x120 [ 4475.716628] submit_bio+0x4a/0x130 [ 4475.716631] iomap_submit_ioend+0x53/0x80 [ 4475.716634] iomap_writepages+0x35/0x40 [ 4475.716636] xfs_vm_writepages+0x84/0xb0 [xfs] [ 4475.716730] do_writepages+0xda/0x200 [ 4475.716736] filemap_fdatawrite_wbc+0x81/0xd0 [ 4475.716739] file_write_and_wait_range+0xac/0xf0 [ 4475.716742] xfs_file_fsync+0x5b/0x250 [xfs] [ 4475.716821] vfs_fsync_range+0x49/0x80 [ 4475.716826] ? __fget_light+0x32/0x80 [ 4475.716829] __x64_sys_fsync+0x38/0x60 [ 4475.716832] do_syscall_64+0x5c/0xc0 [ 4475.716836] ? ksys_write+0xce/0xe0 [ 4475.716838] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4475.716843] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716847] ? __x64_sys_write+0x19/0x20 [ 4475.716849] ? do_syscall_64+0x69/0xc0 [ 4475.716851] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716855] ? __x64_sys_write+0x19/0x20 [ 4475.716857] ? do_syscall_64+0x69/0xc0 [ 4475.716859] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4475.716862] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716865] ? __x64_sys_write+0x19/0x20 [ 4475.716867] ? do_syscall_64+0x69/0xc0 [ 4475.716869] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716873] ? __x64_sys_write+0x19/0x20 [ 4475.716875] ? do_syscall_64+0x69/0xc0 [ 4475.716877] ? do_syscall_64+0x69/0xc0 [ 4475.716879] ? asm_common_interrupt+0x8/0x40 [ 4475.716883] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4475.716885] RIP: 0033:0x7fd3c498aa5b [ 4475.716888] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 4475.716890] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b [ 4475.716892] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006 [ 4475.716893] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40 [ 4475.716895] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000 [ 4475.716896] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000 [ 4475.716898] </TASK> [ 4475.716912] INFO: task kworker/26:87:16682 blocked for more than 122 seconds. [ 4476.063391] Not tainted 5.15.50-051550-generic #202206251445 [ 4476.363476] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4476.743988] task:kworker/26:87 state:D stack: 0 pid:16682 ppid: 2 flags:0x00004000 [ 4476.743994] Workqueue: events update_writeback_rate [bcache] [ 4476.744022] Call Trace: [ 4476.744024] <TASK> [ 4476.744026] __schedule+0x23d/0x5a0 [ 4476.744032] schedule+0x4e/0xb0 [ 4476.744035] rwsem_down_read_slowpath+0x32e/0x380 [ 4476.744039] down_read+0x43/0x90 [ 4476.744042] update_writeback_rate+0x134/0x190 [bcache] [ 4476.744057] process_one_work+0x22b/0x3d0 [ 4476.744063] worker_thread+0x53/0x410 [ 4476.744066] ? process_one_work+0x3d0/0x3d0 [ 4476.744070] kthread+0x12a/0x150 [ 4476.744074] ? set_kthread_struct+0x50/0x50 [ 4476.744077] ret_from_fork+0x22/0x30 [ 4476.744082] </TASK> [ 4598.626220] INFO: task bcache_writebac:1835 blocked for more than 245 seconds. [ 4598.978432] Not tainted 5.15.50-051550-generic #202206251445 [ 4599.279900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4599.657293] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4599.657298] Call Trace: [ 4599.657300] <TASK> [ 4599.657302] __schedule+0x23d/0x5a0 [ 4599.657325] schedule+0x4e/0xb0 [ 4599.657328] rwsem_down_write_slowpath+0x220/0x3d0 [ 4599.657333] down_write+0x43/0x50 [ 4599.657338] bch_writeback_thread+0x78/0x320 [bcache] [ 4599.657374] ? read_dirty_submit+0x70/0x70 [bcache] [ 4599.657390] kthread+0x12a/0x150 [ 4599.657393] ? set_kthread_struct+0x50/0x50 [ 4599.657396] ret_from_fork+0x22/0x30 [ 4599.657401] </TASK> [ 4599.657434] INFO: task kworker/2:63:14895 blocked for more than 124 seconds. [ 4600.000498] Not tainted 5.15.50-051550-generic #202206251445 [ 4600.301188] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4600.681984] task:kworker/2:63 state:D stack: 0 pid:14895 ppid: 2 flags:0x00004000 [ 4600.682005] Workqueue: bcache bch_data_insert_start [bcache] [ 4600.682031] Call Trace: [ 4600.682033] <TASK> [ 4600.682035] __schedule+0x23d/0x5a0 [ 4600.682040] schedule+0x4e/0xb0 [ 4600.682043] bch_bucket_alloc+0x21a/0x5b0 [bcache] [ 4600.682066] ? wait_woken+0x70/0x70 [ 4600.682071] __bch_bucket_alloc_set+0x52/0xf0 [bcache] [ 4600.682082] bch_alloc_sectors+0x1c9/0x4c0 [bcache] [ 4600.682098] ? krealloc+0x9d/0xd0 [ 4600.682102] ? __bch_keylist_realloc+0xb7/0x100 [bcache] [ 4600.682113] ? __bch_submit_bbio+0x97/0xb0 [bcache] [ 4600.682126] bch_data_insert_start+0x15e/0x3a0 [bcache] [ 4600.682139] ? closure_sub+0x94/0xb0 [bcache] [ 4600.682152] process_one_work+0x22b/0x3d0 [ 4600.682157] worker_thread+0x53/0x410 [ 4600.682161] ? process_one_work+0x3d0/0x3d0 [ 4600.682164] kthread+0x12a/0x150 [ 4600.682168] ? set_kthread_struct+0x50/0x50 [ 4600.682171] ret_from_fork+0x22/0x30 [ 4600.682176] </TASK> [ 4600.682240] INFO: task kworker/25:123:16464 blocked for more than 125 seconds. [ 4601.033116] Not tainted 5.15.50-051550-generic #202206251445 [ 4601.333405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4601.713829] task:kworker/25:123 state:D stack: 0 pid:16464 ppid: 2 flags:0x00004000 [ 4601.713835] Workqueue: xfs-sync/bcache0 xfs_log_worker [xfs] [ 4601.713975] Call Trace: [ 4601.713977] <TASK> [ 4601.713979] __schedule+0x23d/0x5a0 [ 4601.713986] schedule+0x4e/0xb0 [ 4601.713988] rwsem_down_read_slowpath+0x32e/0x380 [ 4601.713993] down_read+0x43/0x90 [ 4601.713996] cached_dev_write+0x7e/0x480 [bcache] [ 4601.714018] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4601.714031] __submit_bio+0x1a1/0x220 [ 4601.714035] ? ttwu_do_activate+0x72/0xf0 [ 4601.714039] __submit_bio_noacct+0x85/0x1f0 [ 4601.714042] ? mutex_lock+0x13/0x40 [ 4601.714045] submit_bio_noacct+0x4e/0x120 [ 4601.714047] submit_bio+0x4a/0x130 [ 4601.714050] xlog_write_iclog+0x254/0x300 [xfs] [ 4601.714142] xlog_sync+0x1ab/0x2c0 [xfs] [ 4601.714225] xlog_state_release_iclog+0x123/0x1d0 [xfs] [ 4601.714306] xfs_log_force+0x186/0x250 [xfs] [ 4601.714386] xfs_log_worker+0x39/0x90 [xfs] [ 4601.714466] process_one_work+0x22b/0x3d0 [ 4601.714471] worker_thread+0x53/0x410 [ 4601.714474] ? process_one_work+0x3d0/0x3d0 [ 4601.714477] kthread+0x12a/0x150 [ 4601.714481] ? set_kthread_struct+0x50/0x50 [ 4601.714484] ret_from_fork+0x22/0x30 [ 4601.714489] </TASK> [ 4601.714526] INFO: task kworker/22:89:16557 blocked for more than 126 seconds. [ 4602.056844] Not tainted 5.15.50-051550-generic #202206251445 [ 4602.357148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4602.733923] task:kworker/22:89 state:D stack: 0 pid:16557 ppid: 2 flags:0x00004000 [ 4602.733940] Workqueue: bcache bch_data_insert_start [bcache] [ 4602.733966] Call Trace: [ 4602.733968] <TASK> [ 4602.733970] __schedule+0x23d/0x5a0 [ 4602.733988] schedule+0x4e/0xb0 [ 4602.733991] bch_bucket_alloc+0x21a/0x5b0 [bcache] [ 4602.734002] ? bch_btree_insert_check_key+0x1e0/0x1e0 [bcache] [ 4602.734014] ? wait_woken+0x70/0x70 [ 4602.734019] __bch_bucket_alloc_set+0x52/0xf0 [bcache] [ 4602.734031] bch_alloc_sectors+0x1c9/0x4c0 [bcache] [ 4602.734043] ? __wake_up_common_lock+0x8a/0xc0 [ 4602.734046] bch_data_insert_start+0x15e/0x3a0 [bcache] [ 4602.734062] ? closure_sub+0x94/0xb0 [bcache] [ 4602.734083] process_one_work+0x22b/0x3d0 [ 4602.734088] worker_thread+0x53/0x410 [ 4602.734092] ? process_one_work+0x3d0/0x3d0 [ 4602.734095] kthread+0x12a/0x150 [ 4602.734099] ? set_kthread_struct+0x50/0x50 [ 4602.734102] ret_from_fork+0x22/0x30 [ 4602.734108] </TASK> [ 4602.734134] INFO: task fio:16626 blocked for more than 249 seconds. [ 4603.034541] Not tainted 5.15.50-051550-generic #202206251445 [ 4603.330400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4603.711013] task:fio state:D stack: 0 pid:16626 ppid: 1715 flags:0x00004002 [ 4603.711030] Call Trace: [ 4603.711032] <TASK> [ 4603.711046] __schedule+0x23d/0x5a0 [ 4603.711054] ? sysvec_apic_timer_interrupt+0x4e/0x90 [ 4603.711060] schedule+0x4e/0xb0 [ 4603.711063] rwsem_down_read_slowpath+0x32e/0x380 [ 4603.711067] down_read+0x43/0x90 [ 4603.711072] cached_dev_write+0x7e/0x480 [bcache] [ 4603.711095] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4603.711108] __submit_bio+0x1a1/0x220 [ 4603.711113] __submit_bio_noacct+0x85/0x1f0 [ 4603.711116] submit_bio_noacct+0x4e/0x120 [ 4603.711118] submit_bio+0x4a/0x130 [ 4603.711121] iomap_submit_ioend+0x53/0x80 [ 4603.711124] iomap_writepages+0x35/0x40 [ 4603.711126] xfs_vm_writepages+0x84/0xb0 [xfs] [ 4603.711223] do_writepages+0xda/0x200 [ 4603.711229] filemap_fdatawrite_wbc+0x81/0xd0 [ 4603.711232] file_write_and_wait_range+0xac/0xf0 [ 4603.711235] xfs_file_fsync+0x5b/0x250 [xfs] [ 4603.711312] vfs_fsync_range+0x49/0x80 [ 4603.711317] ? __fget_light+0x32/0x80 [ 4603.711321] __x64_sys_fsync+0x38/0x60 [ 4603.711324] do_syscall_64+0x5c/0xc0 [ 4603.711327] ? ksys_write+0xce/0xe0 [ 4603.711330] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4603.711335] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711338] ? __x64_sys_write+0x19/0x20 [ 4603.711340] ? do_syscall_64+0x69/0xc0 [ 4603.711342] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711345] ? __x64_sys_write+0x19/0x20 [ 4603.711347] ? do_syscall_64+0x69/0xc0 [ 4603.711349] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4603.711352] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711355] ? __x64_sys_write+0x19/0x20 [ 4603.711357] ? do_syscall_64+0x69/0xc0 [ 4603.711359] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711362] ? __x64_sys_write+0x19/0x20 [ 4603.711364] ? do_syscall_64+0x69/0xc0 [ 4603.711366] ? do_syscall_64+0x69/0xc0 [ 4603.711368] ? asm_common_interrupt+0x8/0x40 [ 4603.711371] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4603.711374] RIP: 0033:0x7fd3c498aa5b [ 4603.711376] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 4603.711379] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b [ 4603.711381] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006 [ 4603.711382] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40 [ 4603.711383] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000 [ 4603.711384] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000 [ 4603.711387] </TASK> [ 4603.711421] INFO: task kworker/26:87:16682 blocked for more than 250 seconds. [ 4604.058441] Not tainted 5.15.50-051550-generic #202206251445 [ 4604.359128] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4604.739950] task:kworker/26:87 state:D stack: 0 pid:16682 ppid: 2 flags:0x00004000 [ 4604.739968] Workqueue: events update_writeback_rate [bcache] [ 4604.739995] Call Trace: [ 4604.739997] <TASK> [ 4604.740012] __schedule+0x23d/0x5a0 [ 4604.740017] schedule+0x4e/0xb0 [ 4604.740020] rwsem_down_read_slowpath+0x32e/0x380 [ 4604.740024] down_read+0x43/0x90 [ 4604.740027] update_writeback_rate+0x134/0x190 [bcache] [ 4604.740043] process_one_work+0x22b/0x3d0 [ 4604.740049] worker_thread+0x53/0x410 [ 4604.740052] ? process_one_work+0x3d0/0x3d0 [ 4604.740055] kthread+0x12a/0x150 [ 4604.740059] ? set_kthread_struct+0x50/0x50 [ 4604.740062] ret_from_fork+0x22/0x30 [ 4604.740067] </TASK> [ 4725.600602] INFO: task bcache_writebac:1835 blocked for more than 372 seconds. [ 4725.952827] Not tainted 5.15.50-051550-generic #202206251445 [ 4726.253763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4726.630937] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4726.630954] Call Trace: [ 4726.630956] <TASK> [ 4726.630959] __schedule+0x23d/0x5a0 [ 4726.630966] schedule+0x4e/0xb0 [ 4726.630970] rwsem_down_write_slowpath+0x220/0x3d0 [ 4726.630976] down_write+0x43/0x50 [ 4726.630980] bch_writeback_thread+0x78/0x320 [bcache] [ 4726.631005] ? read_dirty_submit+0x70/0x70 [bcache] [ 4726.631021] kthread+0x12a/0x150 [ 4726.631025] ? set_kthread_struct+0x50/0x50 [ 4726.631028] ret_from_fork+0x22/0x30 [ 4726.631033] </TASK> [Fix] commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800     bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() I have built this fix into kernel 5.15.50 (https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15/ ) and tested to verify the problem is fixed. root@nikhilbionic:/home/ubuntu/nikhil/5_15_50/cod/mainline/v5.15.50# git log commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800     bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()     The local variables check_state (in bch_btree_check()) and state (in     bch_sectors_dirty_init()) should be fully filled by 0, because before     allocating them on stack, they were dynamically allocated by kzalloc().     Signed-off-by: Coly Li <colyli@suse.de>     Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de     Signed-off-by: Jens Axboe <axboe@kernel.dk> commit 18a33c8dabb88b50b860e0177a73933f2c0ddf68 (tag: v5.15.50) Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Date: Sat Jun 25 15:18:40 2022 +0200     Linux 5.15.50     Link: https://lore.kernel.org/r/20220623164322.288837280@linuxfoundation.org     Tested-by: Florian Fainelli <f.fainelli@gmail.com>     Tested-by: Shuah Khan <skhan@linuxfoundation.org>     Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>     Tested-by: Jon Hunter <jonathanh@nvidia.com>     Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>     Tested-by: Ron Economos <re@w6rz.net>     Tested-by: Guenter Roeck <linux@roeck-us.net>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [Regression Potential] I have not seen any potential drawbacks or harmful effects of this fix in my testing. In fact it is required, without which the deadlock is easily reproduced both on focal as well as jammy GA.
2022-07-27 08:09:27 nikhil kshirsagar description SRU Justification: [Impact] This bug in bcache affects (at least) focal and jammy releases. (not tested bionic as yet) When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic, and is fixed through upstream commit 7d6b902ea0e02b2a25c480edf471cbaa4ebe6b3c The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> [ 4474.731735] INFO: task fio:16626 blocked for more than 121 seconds. [ 4475.035858] Not tainted 5.15.50-051550-generic #202206251445 [ 4475.335859] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4475.716526] task:fio state:D stack: 0 pid:16626 ppid: 1715 flags:0x00004002 [ 4475.716531] Call Trace: [ 4475.716546] <TASK> [ 4475.716549] __schedule+0x23d/0x5a0 [ 4475.716555] ? sysvec_apic_timer_interrupt+0x4e/0x90 [ 4475.716560] schedule+0x4e/0xb0 [ 4475.716563] rwsem_down_read_slowpath+0x32e/0x380 [ 4475.716567] down_read+0x43/0x90 [ 4475.716571] cached_dev_write+0x7e/0x480 [bcache] [ 4475.716604] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4475.716618] __submit_bio+0x1a1/0x220 [ 4475.716623] __submit_bio_noacct+0x85/0x1f0 [ 4475.716626] submit_bio_noacct+0x4e/0x120 [ 4475.716628] submit_bio+0x4a/0x130 [ 4475.716631] iomap_submit_ioend+0x53/0x80 [ 4475.716634] iomap_writepages+0x35/0x40 [ 4475.716636] xfs_vm_writepages+0x84/0xb0 [xfs] [ 4475.716730] do_writepages+0xda/0x200 [ 4475.716736] filemap_fdatawrite_wbc+0x81/0xd0 [ 4475.716739] file_write_and_wait_range+0xac/0xf0 [ 4475.716742] xfs_file_fsync+0x5b/0x250 [xfs] [ 4475.716821] vfs_fsync_range+0x49/0x80 [ 4475.716826] ? __fget_light+0x32/0x80 [ 4475.716829] __x64_sys_fsync+0x38/0x60 [ 4475.716832] do_syscall_64+0x5c/0xc0 [ 4475.716836] ? ksys_write+0xce/0xe0 [ 4475.716838] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4475.716843] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716847] ? __x64_sys_write+0x19/0x20 [ 4475.716849] ? do_syscall_64+0x69/0xc0 [ 4475.716851] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716855] ? __x64_sys_write+0x19/0x20 [ 4475.716857] ? do_syscall_64+0x69/0xc0 [ 4475.716859] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4475.716862] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716865] ? __x64_sys_write+0x19/0x20 [ 4475.716867] ? do_syscall_64+0x69/0xc0 [ 4475.716869] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716873] ? __x64_sys_write+0x19/0x20 [ 4475.716875] ? do_syscall_64+0x69/0xc0 [ 4475.716877] ? do_syscall_64+0x69/0xc0 [ 4475.716879] ? asm_common_interrupt+0x8/0x40 [ 4475.716883] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4475.716885] RIP: 0033:0x7fd3c498aa5b [ 4475.716888] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 4475.716890] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b [ 4475.716892] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006 [ 4475.716893] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40 [ 4475.716895] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000 [ 4475.716896] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000 [ 4475.716898] </TASK> [ 4475.716912] INFO: task kworker/26:87:16682 blocked for more than 122 seconds. [ 4476.063391] Not tainted 5.15.50-051550-generic #202206251445 [ 4476.363476] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4476.743988] task:kworker/26:87 state:D stack: 0 pid:16682 ppid: 2 flags:0x00004000 [ 4476.743994] Workqueue: events update_writeback_rate [bcache] [ 4476.744022] Call Trace: [ 4476.744024] <TASK> [ 4476.744026] __schedule+0x23d/0x5a0 [ 4476.744032] schedule+0x4e/0xb0 [ 4476.744035] rwsem_down_read_slowpath+0x32e/0x380 [ 4476.744039] down_read+0x43/0x90 [ 4476.744042] update_writeback_rate+0x134/0x190 [bcache] [ 4476.744057] process_one_work+0x22b/0x3d0 [ 4476.744063] worker_thread+0x53/0x410 [ 4476.744066] ? process_one_work+0x3d0/0x3d0 [ 4476.744070] kthread+0x12a/0x150 [ 4476.744074] ? set_kthread_struct+0x50/0x50 [ 4476.744077] ret_from_fork+0x22/0x30 [ 4476.744082] </TASK> [ 4598.626220] INFO: task bcache_writebac:1835 blocked for more than 245 seconds. [ 4598.978432] Not tainted 5.15.50-051550-generic #202206251445 [ 4599.279900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4599.657293] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4599.657298] Call Trace: [ 4599.657300] <TASK> [ 4599.657302] __schedule+0x23d/0x5a0 [ 4599.657325] schedule+0x4e/0xb0 [ 4599.657328] rwsem_down_write_slowpath+0x220/0x3d0 [ 4599.657333] down_write+0x43/0x50 [ 4599.657338] bch_writeback_thread+0x78/0x320 [bcache] [ 4599.657374] ? read_dirty_submit+0x70/0x70 [bcache] [ 4599.657390] kthread+0x12a/0x150 [ 4599.657393] ? set_kthread_struct+0x50/0x50 [ 4599.657396] ret_from_fork+0x22/0x30 [ 4599.657401] </TASK> [ 4599.657434] INFO: task kworker/2:63:14895 blocked for more than 124 seconds. [ 4600.000498] Not tainted 5.15.50-051550-generic #202206251445 [ 4600.301188] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4600.681984] task:kworker/2:63 state:D stack: 0 pid:14895 ppid: 2 flags:0x00004000 [ 4600.682005] Workqueue: bcache bch_data_insert_start [bcache] [ 4600.682031] Call Trace: [ 4600.682033] <TASK> [ 4600.682035] __schedule+0x23d/0x5a0 [ 4600.682040] schedule+0x4e/0xb0 [ 4600.682043] bch_bucket_alloc+0x21a/0x5b0 [bcache] [ 4600.682066] ? wait_woken+0x70/0x70 [ 4600.682071] __bch_bucket_alloc_set+0x52/0xf0 [bcache] [ 4600.682082] bch_alloc_sectors+0x1c9/0x4c0 [bcache] [ 4600.682098] ? krealloc+0x9d/0xd0 [ 4600.682102] ? __bch_keylist_realloc+0xb7/0x100 [bcache] [ 4600.682113] ? __bch_submit_bbio+0x97/0xb0 [bcache] [ 4600.682126] bch_data_insert_start+0x15e/0x3a0 [bcache] [ 4600.682139] ? closure_sub+0x94/0xb0 [bcache] [ 4600.682152] process_one_work+0x22b/0x3d0 [ 4600.682157] worker_thread+0x53/0x410 [ 4600.682161] ? process_one_work+0x3d0/0x3d0 [ 4600.682164] kthread+0x12a/0x150 [ 4600.682168] ? set_kthread_struct+0x50/0x50 [ 4600.682171] ret_from_fork+0x22/0x30 [ 4600.682176] </TASK> [ 4600.682240] INFO: task kworker/25:123:16464 blocked for more than 125 seconds. [ 4601.033116] Not tainted 5.15.50-051550-generic #202206251445 [ 4601.333405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4601.713829] task:kworker/25:123 state:D stack: 0 pid:16464 ppid: 2 flags:0x00004000 [ 4601.713835] Workqueue: xfs-sync/bcache0 xfs_log_worker [xfs] [ 4601.713975] Call Trace: [ 4601.713977] <TASK> [ 4601.713979] __schedule+0x23d/0x5a0 [ 4601.713986] schedule+0x4e/0xb0 [ 4601.713988] rwsem_down_read_slowpath+0x32e/0x380 [ 4601.713993] down_read+0x43/0x90 [ 4601.713996] cached_dev_write+0x7e/0x480 [bcache] [ 4601.714018] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4601.714031] __submit_bio+0x1a1/0x220 [ 4601.714035] ? ttwu_do_activate+0x72/0xf0 [ 4601.714039] __submit_bio_noacct+0x85/0x1f0 [ 4601.714042] ? mutex_lock+0x13/0x40 [ 4601.714045] submit_bio_noacct+0x4e/0x120 [ 4601.714047] submit_bio+0x4a/0x130 [ 4601.714050] xlog_write_iclog+0x254/0x300 [xfs] [ 4601.714142] xlog_sync+0x1ab/0x2c0 [xfs] [ 4601.714225] xlog_state_release_iclog+0x123/0x1d0 [xfs] [ 4601.714306] xfs_log_force+0x186/0x250 [xfs] [ 4601.714386] xfs_log_worker+0x39/0x90 [xfs] [ 4601.714466] process_one_work+0x22b/0x3d0 [ 4601.714471] worker_thread+0x53/0x410 [ 4601.714474] ? process_one_work+0x3d0/0x3d0 [ 4601.714477] kthread+0x12a/0x150 [ 4601.714481] ? set_kthread_struct+0x50/0x50 [ 4601.714484] ret_from_fork+0x22/0x30 [ 4601.714489] </TASK> [ 4601.714526] INFO: task kworker/22:89:16557 blocked for more than 126 seconds. [ 4602.056844] Not tainted 5.15.50-051550-generic #202206251445 [ 4602.357148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4602.733923] task:kworker/22:89 state:D stack: 0 pid:16557 ppid: 2 flags:0x00004000 [ 4602.733940] Workqueue: bcache bch_data_insert_start [bcache] [ 4602.733966] Call Trace: [ 4602.733968] <TASK> [ 4602.733970] __schedule+0x23d/0x5a0 [ 4602.733988] schedule+0x4e/0xb0 [ 4602.733991] bch_bucket_alloc+0x21a/0x5b0 [bcache] [ 4602.734002] ? bch_btree_insert_check_key+0x1e0/0x1e0 [bcache] [ 4602.734014] ? wait_woken+0x70/0x70 [ 4602.734019] __bch_bucket_alloc_set+0x52/0xf0 [bcache] [ 4602.734031] bch_alloc_sectors+0x1c9/0x4c0 [bcache] [ 4602.734043] ? __wake_up_common_lock+0x8a/0xc0 [ 4602.734046] bch_data_insert_start+0x15e/0x3a0 [bcache] [ 4602.734062] ? closure_sub+0x94/0xb0 [bcache] [ 4602.734083] process_one_work+0x22b/0x3d0 [ 4602.734088] worker_thread+0x53/0x410 [ 4602.734092] ? process_one_work+0x3d0/0x3d0 [ 4602.734095] kthread+0x12a/0x150 [ 4602.734099] ? set_kthread_struct+0x50/0x50 [ 4602.734102] ret_from_fork+0x22/0x30 [ 4602.734108] </TASK> [ 4602.734134] INFO: task fio:16626 blocked for more than 249 seconds. [ 4603.034541] Not tainted 5.15.50-051550-generic #202206251445 [ 4603.330400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4603.711013] task:fio state:D stack: 0 pid:16626 ppid: 1715 flags:0x00004002 [ 4603.711030] Call Trace: [ 4603.711032] <TASK> [ 4603.711046] __schedule+0x23d/0x5a0 [ 4603.711054] ? sysvec_apic_timer_interrupt+0x4e/0x90 [ 4603.711060] schedule+0x4e/0xb0 [ 4603.711063] rwsem_down_read_slowpath+0x32e/0x380 [ 4603.711067] down_read+0x43/0x90 [ 4603.711072] cached_dev_write+0x7e/0x480 [bcache] [ 4603.711095] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4603.711108] __submit_bio+0x1a1/0x220 [ 4603.711113] __submit_bio_noacct+0x85/0x1f0 [ 4603.711116] submit_bio_noacct+0x4e/0x120 [ 4603.711118] submit_bio+0x4a/0x130 [ 4603.711121] iomap_submit_ioend+0x53/0x80 [ 4603.711124] iomap_writepages+0x35/0x40 [ 4603.711126] xfs_vm_writepages+0x84/0xb0 [xfs] [ 4603.711223] do_writepages+0xda/0x200 [ 4603.711229] filemap_fdatawrite_wbc+0x81/0xd0 [ 4603.711232] file_write_and_wait_range+0xac/0xf0 [ 4603.711235] xfs_file_fsync+0x5b/0x250 [xfs] [ 4603.711312] vfs_fsync_range+0x49/0x80 [ 4603.711317] ? __fget_light+0x32/0x80 [ 4603.711321] __x64_sys_fsync+0x38/0x60 [ 4603.711324] do_syscall_64+0x5c/0xc0 [ 4603.711327] ? ksys_write+0xce/0xe0 [ 4603.711330] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4603.711335] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711338] ? __x64_sys_write+0x19/0x20 [ 4603.711340] ? do_syscall_64+0x69/0xc0 [ 4603.711342] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711345] ? __x64_sys_write+0x19/0x20 [ 4603.711347] ? do_syscall_64+0x69/0xc0 [ 4603.711349] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4603.711352] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711355] ? __x64_sys_write+0x19/0x20 [ 4603.711357] ? do_syscall_64+0x69/0xc0 [ 4603.711359] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711362] ? __x64_sys_write+0x19/0x20 [ 4603.711364] ? do_syscall_64+0x69/0xc0 [ 4603.711366] ? do_syscall_64+0x69/0xc0 [ 4603.711368] ? asm_common_interrupt+0x8/0x40 [ 4603.711371] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4603.711374] RIP: 0033:0x7fd3c498aa5b [ 4603.711376] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 4603.711379] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b [ 4603.711381] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006 [ 4603.711382] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40 [ 4603.711383] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000 [ 4603.711384] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000 [ 4603.711387] </TASK> [ 4603.711421] INFO: task kworker/26:87:16682 blocked for more than 250 seconds. [ 4604.058441] Not tainted 5.15.50-051550-generic #202206251445 [ 4604.359128] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4604.739950] task:kworker/26:87 state:D stack: 0 pid:16682 ppid: 2 flags:0x00004000 [ 4604.739968] Workqueue: events update_writeback_rate [bcache] [ 4604.739995] Call Trace: [ 4604.739997] <TASK> [ 4604.740012] __schedule+0x23d/0x5a0 [ 4604.740017] schedule+0x4e/0xb0 [ 4604.740020] rwsem_down_read_slowpath+0x32e/0x380 [ 4604.740024] down_read+0x43/0x90 [ 4604.740027] update_writeback_rate+0x134/0x190 [bcache] [ 4604.740043] process_one_work+0x22b/0x3d0 [ 4604.740049] worker_thread+0x53/0x410 [ 4604.740052] ? process_one_work+0x3d0/0x3d0 [ 4604.740055] kthread+0x12a/0x150 [ 4604.740059] ? set_kthread_struct+0x50/0x50 [ 4604.740062] ret_from_fork+0x22/0x30 [ 4604.740067] </TASK> [ 4725.600602] INFO: task bcache_writebac:1835 blocked for more than 372 seconds. [ 4725.952827] Not tainted 5.15.50-051550-generic #202206251445 [ 4726.253763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4726.630937] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4726.630954] Call Trace: [ 4726.630956] <TASK> [ 4726.630959] __schedule+0x23d/0x5a0 [ 4726.630966] schedule+0x4e/0xb0 [ 4726.630970] rwsem_down_write_slowpath+0x220/0x3d0 [ 4726.630976] down_write+0x43/0x50 [ 4726.630980] bch_writeback_thread+0x78/0x320 [bcache] [ 4726.631005] ? read_dirty_submit+0x70/0x70 [bcache] [ 4726.631021] kthread+0x12a/0x150 [ 4726.631025] ? set_kthread_struct+0x50/0x50 [ 4726.631028] ret_from_fork+0x22/0x30 [ 4726.631033] </TASK> [Fix] commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800     bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() I have built this fix into kernel 5.15.50 (https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15/ ) and tested to verify the problem is fixed. root@nikhilbionic:/home/ubuntu/nikhil/5_15_50/cod/mainline/v5.15.50# git log commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800     bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()     The local variables check_state (in bch_btree_check()) and state (in     bch_sectors_dirty_init()) should be fully filled by 0, because before     allocating them on stack, they were dynamically allocated by kzalloc().     Signed-off-by: Coly Li <colyli@suse.de>     Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de     Signed-off-by: Jens Axboe <axboe@kernel.dk> commit 18a33c8dabb88b50b860e0177a73933f2c0ddf68 (tag: v5.15.50) Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Date: Sat Jun 25 15:18:40 2022 +0200     Linux 5.15.50     Link: https://lore.kernel.org/r/20220623164322.288837280@linuxfoundation.org     Tested-by: Florian Fainelli <f.fainelli@gmail.com>     Tested-by: Shuah Khan <skhan@linuxfoundation.org>     Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>     Tested-by: Jon Hunter <jonathanh@nvidia.com>     Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>     Tested-by: Ron Economos <re@w6rz.net>     Tested-by: Guenter Roeck <linux@roeck-us.net>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [Regression Potential] I have not seen any potential drawbacks or harmful effects of this fix in my testing. In fact it is required, without which the deadlock is easily reproduced both on focal as well as jammy GA. SRU Justification: [Impact] This bug in bcache affects (at least) focal and jammy releases. When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic, and is fixed through upstream commit 7d6b902ea0e02b2a25c480edf471cbaa4ebe6b3c The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> [ 4474.731735] INFO: task fio:16626 blocked for more than 121 seconds. [ 4475.035858] Not tainted 5.15.50-051550-generic #202206251445 [ 4475.335859] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4475.716526] task:fio state:D stack: 0 pid:16626 ppid: 1715 flags:0x00004002 [ 4475.716531] Call Trace: [ 4475.716546] <TASK> [ 4475.716549] __schedule+0x23d/0x5a0 [ 4475.716555] ? sysvec_apic_timer_interrupt+0x4e/0x90 [ 4475.716560] schedule+0x4e/0xb0 [ 4475.716563] rwsem_down_read_slowpath+0x32e/0x380 [ 4475.716567] down_read+0x43/0x90 [ 4475.716571] cached_dev_write+0x7e/0x480 [bcache] [ 4475.716604] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4475.716618] __submit_bio+0x1a1/0x220 [ 4475.716623] __submit_bio_noacct+0x85/0x1f0 [ 4475.716626] submit_bio_noacct+0x4e/0x120 [ 4475.716628] submit_bio+0x4a/0x130 [ 4475.716631] iomap_submit_ioend+0x53/0x80 [ 4475.716634] iomap_writepages+0x35/0x40 [ 4475.716636] xfs_vm_writepages+0x84/0xb0 [xfs] [ 4475.716730] do_writepages+0xda/0x200 [ 4475.716736] filemap_fdatawrite_wbc+0x81/0xd0 [ 4475.716739] file_write_and_wait_range+0xac/0xf0 [ 4475.716742] xfs_file_fsync+0x5b/0x250 [xfs] [ 4475.716821] vfs_fsync_range+0x49/0x80 [ 4475.716826] ? __fget_light+0x32/0x80 [ 4475.716829] __x64_sys_fsync+0x38/0x60 [ 4475.716832] do_syscall_64+0x5c/0xc0 [ 4475.716836] ? ksys_write+0xce/0xe0 [ 4475.716838] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4475.716843] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716847] ? __x64_sys_write+0x19/0x20 [ 4475.716849] ? do_syscall_64+0x69/0xc0 [ 4475.716851] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716855] ? __x64_sys_write+0x19/0x20 [ 4475.716857] ? do_syscall_64+0x69/0xc0 [ 4475.716859] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4475.716862] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716865] ? __x64_sys_write+0x19/0x20 [ 4475.716867] ? do_syscall_64+0x69/0xc0 [ 4475.716869] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716873] ? __x64_sys_write+0x19/0x20 [ 4475.716875] ? do_syscall_64+0x69/0xc0 [ 4475.716877] ? do_syscall_64+0x69/0xc0 [ 4475.716879] ? asm_common_interrupt+0x8/0x40 [ 4475.716883] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4475.716885] RIP: 0033:0x7fd3c498aa5b [ 4475.716888] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 4475.716890] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b [ 4475.716892] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006 [ 4475.716893] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40 [ 4475.716895] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000 [ 4475.716896] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000 [ 4475.716898] </TASK> [ 4475.716912] INFO: task kworker/26:87:16682 blocked for more than 122 seconds. [ 4476.063391] Not tainted 5.15.50-051550-generic #202206251445 [ 4476.363476] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4476.743988] task:kworker/26:87 state:D stack: 0 pid:16682 ppid: 2 flags:0x00004000 [ 4476.743994] Workqueue: events update_writeback_rate [bcache] [ 4476.744022] Call Trace: [ 4476.744024] <TASK> [ 4476.744026] __schedule+0x23d/0x5a0 [ 4476.744032] schedule+0x4e/0xb0 [ 4476.744035] rwsem_down_read_slowpath+0x32e/0x380 [ 4476.744039] down_read+0x43/0x90 [ 4476.744042] update_writeback_rate+0x134/0x190 [bcache] [ 4476.744057] process_one_work+0x22b/0x3d0 [ 4476.744063] worker_thread+0x53/0x410 [ 4476.744066] ? process_one_work+0x3d0/0x3d0 [ 4476.744070] kthread+0x12a/0x150 [ 4476.744074] ? set_kthread_struct+0x50/0x50 [ 4476.744077] ret_from_fork+0x22/0x30 [ 4476.744082] </TASK> [ 4598.626220] INFO: task bcache_writebac:1835 blocked for more than 245 seconds. [ 4598.978432] Not tainted 5.15.50-051550-generic #202206251445 [ 4599.279900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4599.657293] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4599.657298] Call Trace: [ 4599.657300] <TASK> [ 4599.657302] __schedule+0x23d/0x5a0 [ 4599.657325] schedule+0x4e/0xb0 [ 4599.657328] rwsem_down_write_slowpath+0x220/0x3d0 [ 4599.657333] down_write+0x43/0x50 [ 4599.657338] bch_writeback_thread+0x78/0x320 [bcache] [ 4599.657374] ? read_dirty_submit+0x70/0x70 [bcache] [ 4599.657390] kthread+0x12a/0x150 [ 4599.657393] ? set_kthread_struct+0x50/0x50 [ 4599.657396] ret_from_fork+0x22/0x30 [ 4599.657401] </TASK> [ 4599.657434] INFO: task kworker/2:63:14895 blocked for more than 124 seconds. [ 4600.000498] Not tainted 5.15.50-051550-generic #202206251445 [ 4600.301188] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4600.681984] task:kworker/2:63 state:D stack: 0 pid:14895 ppid: 2 flags:0x00004000 [ 4600.682005] Workqueue: bcache bch_data_insert_start [bcache] [ 4600.682031] Call Trace: [ 4600.682033] <TASK> [ 4600.682035] __schedule+0x23d/0x5a0 [ 4600.682040] schedule+0x4e/0xb0 [ 4600.682043] bch_bucket_alloc+0x21a/0x5b0 [bcache] [ 4600.682066] ? wait_woken+0x70/0x70 [ 4600.682071] __bch_bucket_alloc_set+0x52/0xf0 [bcache] [ 4600.682082] bch_alloc_sectors+0x1c9/0x4c0 [bcache] [ 4600.682098] ? krealloc+0x9d/0xd0 [ 4600.682102] ? __bch_keylist_realloc+0xb7/0x100 [bcache] [ 4600.682113] ? __bch_submit_bbio+0x97/0xb0 [bcache] [ 4600.682126] bch_data_insert_start+0x15e/0x3a0 [bcache] [ 4600.682139] ? closure_sub+0x94/0xb0 [bcache] [ 4600.682152] process_one_work+0x22b/0x3d0 [ 4600.682157] worker_thread+0x53/0x410 [ 4600.682161] ? process_one_work+0x3d0/0x3d0 [ 4600.682164] kthread+0x12a/0x150 [ 4600.682168] ? set_kthread_struct+0x50/0x50 [ 4600.682171] ret_from_fork+0x22/0x30 [ 4600.682176] </TASK> [ 4600.682240] INFO: task kworker/25:123:16464 blocked for more than 125 seconds. [ 4601.033116] Not tainted 5.15.50-051550-generic #202206251445 [ 4601.333405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4601.713829] task:kworker/25:123 state:D stack: 0 pid:16464 ppid: 2 flags:0x00004000 [ 4601.713835] Workqueue: xfs-sync/bcache0 xfs_log_worker [xfs] [ 4601.713975] Call Trace: [ 4601.713977] <TASK> [ 4601.713979] __schedule+0x23d/0x5a0 [ 4601.713986] schedule+0x4e/0xb0 [ 4601.713988] rwsem_down_read_slowpath+0x32e/0x380 [ 4601.713993] down_read+0x43/0x90 [ 4601.713996] cached_dev_write+0x7e/0x480 [bcache] [ 4601.714018] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4601.714031] __submit_bio+0x1a1/0x220 [ 4601.714035] ? ttwu_do_activate+0x72/0xf0 [ 4601.714039] __submit_bio_noacct+0x85/0x1f0 [ 4601.714042] ? mutex_lock+0x13/0x40 [ 4601.714045] submit_bio_noacct+0x4e/0x120 [ 4601.714047] submit_bio+0x4a/0x130 [ 4601.714050] xlog_write_iclog+0x254/0x300 [xfs] [ 4601.714142] xlog_sync+0x1ab/0x2c0 [xfs] [ 4601.714225] xlog_state_release_iclog+0x123/0x1d0 [xfs] [ 4601.714306] xfs_log_force+0x186/0x250 [xfs] [ 4601.714386] xfs_log_worker+0x39/0x90 [xfs] [ 4601.714466] process_one_work+0x22b/0x3d0 [ 4601.714471] worker_thread+0x53/0x410 [ 4601.714474] ? process_one_work+0x3d0/0x3d0 [ 4601.714477] kthread+0x12a/0x150 [ 4601.714481] ? set_kthread_struct+0x50/0x50 [ 4601.714484] ret_from_fork+0x22/0x30 [ 4601.714489] </TASK> [ 4601.714526] INFO: task kworker/22:89:16557 blocked for more than 126 seconds. [ 4602.056844] Not tainted 5.15.50-051550-generic #202206251445 [ 4602.357148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4602.733923] task:kworker/22:89 state:D stack: 0 pid:16557 ppid: 2 flags:0x00004000 [ 4602.733940] Workqueue: bcache bch_data_insert_start [bcache] [ 4602.733966] Call Trace: [ 4602.733968] <TASK> [ 4602.733970] __schedule+0x23d/0x5a0 [ 4602.733988] schedule+0x4e/0xb0 [ 4602.733991] bch_bucket_alloc+0x21a/0x5b0 [bcache] [ 4602.734002] ? bch_btree_insert_check_key+0x1e0/0x1e0 [bcache] [ 4602.734014] ? wait_woken+0x70/0x70 [ 4602.734019] __bch_bucket_alloc_set+0x52/0xf0 [bcache] [ 4602.734031] bch_alloc_sectors+0x1c9/0x4c0 [bcache] [ 4602.734043] ? __wake_up_common_lock+0x8a/0xc0 [ 4602.734046] bch_data_insert_start+0x15e/0x3a0 [bcache] [ 4602.734062] ? closure_sub+0x94/0xb0 [bcache] [ 4602.734083] process_one_work+0x22b/0x3d0 [ 4602.734088] worker_thread+0x53/0x410 [ 4602.734092] ? process_one_work+0x3d0/0x3d0 [ 4602.734095] kthread+0x12a/0x150 [ 4602.734099] ? set_kthread_struct+0x50/0x50 [ 4602.734102] ret_from_fork+0x22/0x30 [ 4602.734108] </TASK> [ 4602.734134] INFO: task fio:16626 blocked for more than 249 seconds. [ 4603.034541] Not tainted 5.15.50-051550-generic #202206251445 [ 4603.330400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4603.711013] task:fio state:D stack: 0 pid:16626 ppid: 1715 flags:0x00004002 [ 4603.711030] Call Trace: [ 4603.711032] <TASK> [ 4603.711046] __schedule+0x23d/0x5a0 [ 4603.711054] ? sysvec_apic_timer_interrupt+0x4e/0x90 [ 4603.711060] schedule+0x4e/0xb0 [ 4603.711063] rwsem_down_read_slowpath+0x32e/0x380 [ 4603.711067] down_read+0x43/0x90 [ 4603.711072] cached_dev_write+0x7e/0x480 [bcache] [ 4603.711095] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4603.711108] __submit_bio+0x1a1/0x220 [ 4603.711113] __submit_bio_noacct+0x85/0x1f0 [ 4603.711116] submit_bio_noacct+0x4e/0x120 [ 4603.711118] submit_bio+0x4a/0x130 [ 4603.711121] iomap_submit_ioend+0x53/0x80 [ 4603.711124] iomap_writepages+0x35/0x40 [ 4603.711126] xfs_vm_writepages+0x84/0xb0 [xfs] [ 4603.711223] do_writepages+0xda/0x200 [ 4603.711229] filemap_fdatawrite_wbc+0x81/0xd0 [ 4603.711232] file_write_and_wait_range+0xac/0xf0 [ 4603.711235] xfs_file_fsync+0x5b/0x250 [xfs] [ 4603.711312] vfs_fsync_range+0x49/0x80 [ 4603.711317] ? __fget_light+0x32/0x80 [ 4603.711321] __x64_sys_fsync+0x38/0x60 [ 4603.711324] do_syscall_64+0x5c/0xc0 [ 4603.711327] ? ksys_write+0xce/0xe0 [ 4603.711330] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4603.711335] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711338] ? __x64_sys_write+0x19/0x20 [ 4603.711340] ? do_syscall_64+0x69/0xc0 [ 4603.711342] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711345] ? __x64_sys_write+0x19/0x20 [ 4603.711347] ? do_syscall_64+0x69/0xc0 [ 4603.711349] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4603.711352] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711355] ? __x64_sys_write+0x19/0x20 [ 4603.711357] ? do_syscall_64+0x69/0xc0 [ 4603.711359] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711362] ? __x64_sys_write+0x19/0x20 [ 4603.711364] ? do_syscall_64+0x69/0xc0 [ 4603.711366] ? do_syscall_64+0x69/0xc0 [ 4603.711368] ? asm_common_interrupt+0x8/0x40 [ 4603.711371] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4603.711374] RIP: 0033:0x7fd3c498aa5b [ 4603.711376] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 4603.711379] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b [ 4603.711381] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006 [ 4603.711382] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40 [ 4603.711383] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000 [ 4603.711384] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000 [ 4603.711387] </TASK> [ 4603.711421] INFO: task kworker/26:87:16682 blocked for more than 250 seconds. [ 4604.058441] Not tainted 5.15.50-051550-generic #202206251445 [ 4604.359128] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4604.739950] task:kworker/26:87 state:D stack: 0 pid:16682 ppid: 2 flags:0x00004000 [ 4604.739968] Workqueue: events update_writeback_rate [bcache] [ 4604.739995] Call Trace: [ 4604.739997] <TASK> [ 4604.740012] __schedule+0x23d/0x5a0 [ 4604.740017] schedule+0x4e/0xb0 [ 4604.740020] rwsem_down_read_slowpath+0x32e/0x380 [ 4604.740024] down_read+0x43/0x90 [ 4604.740027] update_writeback_rate+0x134/0x190 [bcache] [ 4604.740043] process_one_work+0x22b/0x3d0 [ 4604.740049] worker_thread+0x53/0x410 [ 4604.740052] ? process_one_work+0x3d0/0x3d0 [ 4604.740055] kthread+0x12a/0x150 [ 4604.740059] ? set_kthread_struct+0x50/0x50 [ 4604.740062] ret_from_fork+0x22/0x30 [ 4604.740067] </TASK> [ 4725.600602] INFO: task bcache_writebac:1835 blocked for more than 372 seconds. [ 4725.952827] Not tainted 5.15.50-051550-generic #202206251445 [ 4726.253763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4726.630937] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4726.630954] Call Trace: [ 4726.630956] <TASK> [ 4726.630959] __schedule+0x23d/0x5a0 [ 4726.630966] schedule+0x4e/0xb0 [ 4726.630970] rwsem_down_write_slowpath+0x220/0x3d0 [ 4726.630976] down_write+0x43/0x50 [ 4726.630980] bch_writeback_thread+0x78/0x320 [bcache] [ 4726.631005] ? read_dirty_submit+0x70/0x70 [bcache] [ 4726.631021] kthread+0x12a/0x150 [ 4726.631025] ? set_kthread_struct+0x50/0x50 [ 4726.631028] ret_from_fork+0x22/0x30 [ 4726.631033] </TASK> [Fix] commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800     bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() I have built this fix into kernel 5.15.50 (https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15/ ) and tested to verify the problem is fixed. root@nikhilbionic:/home/ubuntu/nikhil/5_15_50/cod/mainline/v5.15.50# git log commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800     bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()     The local variables check_state (in bch_btree_check()) and state (in     bch_sectors_dirty_init()) should be fully filled by 0, because before     allocating them on stack, they were dynamically allocated by kzalloc().     Signed-off-by: Coly Li <colyli@suse.de>     Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de     Signed-off-by: Jens Axboe <axboe@kernel.dk> commit 18a33c8dabb88b50b860e0177a73933f2c0ddf68 (tag: v5.15.50) Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Date: Sat Jun 25 15:18:40 2022 +0200     Linux 5.15.50     Link: https://lore.kernel.org/r/20220623164322.288837280@linuxfoundation.org     Tested-by: Florian Fainelli <f.fainelli@gmail.com>     Tested-by: Shuah Khan <skhan@linuxfoundation.org>     Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>     Tested-by: Jon Hunter <jonathanh@nvidia.com>     Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>     Tested-by: Ron Economos <re@w6rz.net>     Tested-by: Guenter Roeck <linux@roeck-us.net>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [Regression Potential] I have not seen any potential drawbacks or harmful effects of this fix in my testing. In fact it is required, without which the deadlock is easily reproduced both on focal as well as jammy GA.
2022-07-29 09:28:53 nikhil kshirsagar description SRU Justification: [Impact] This bug in bcache affects (at least) focal and jammy releases. When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic, and is fixed through upstream commit 7d6b902ea0e02b2a25c480edf471cbaa4ebe6b3c The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> [ 4474.731735] INFO: task fio:16626 blocked for more than 121 seconds. [ 4475.035858] Not tainted 5.15.50-051550-generic #202206251445 [ 4475.335859] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4475.716526] task:fio state:D stack: 0 pid:16626 ppid: 1715 flags:0x00004002 [ 4475.716531] Call Trace: [ 4475.716546] <TASK> [ 4475.716549] __schedule+0x23d/0x5a0 [ 4475.716555] ? sysvec_apic_timer_interrupt+0x4e/0x90 [ 4475.716560] schedule+0x4e/0xb0 [ 4475.716563] rwsem_down_read_slowpath+0x32e/0x380 [ 4475.716567] down_read+0x43/0x90 [ 4475.716571] cached_dev_write+0x7e/0x480 [bcache] [ 4475.716604] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4475.716618] __submit_bio+0x1a1/0x220 [ 4475.716623] __submit_bio_noacct+0x85/0x1f0 [ 4475.716626] submit_bio_noacct+0x4e/0x120 [ 4475.716628] submit_bio+0x4a/0x130 [ 4475.716631] iomap_submit_ioend+0x53/0x80 [ 4475.716634] iomap_writepages+0x35/0x40 [ 4475.716636] xfs_vm_writepages+0x84/0xb0 [xfs] [ 4475.716730] do_writepages+0xda/0x200 [ 4475.716736] filemap_fdatawrite_wbc+0x81/0xd0 [ 4475.716739] file_write_and_wait_range+0xac/0xf0 [ 4475.716742] xfs_file_fsync+0x5b/0x250 [xfs] [ 4475.716821] vfs_fsync_range+0x49/0x80 [ 4475.716826] ? __fget_light+0x32/0x80 [ 4475.716829] __x64_sys_fsync+0x38/0x60 [ 4475.716832] do_syscall_64+0x5c/0xc0 [ 4475.716836] ? ksys_write+0xce/0xe0 [ 4475.716838] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4475.716843] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716847] ? __x64_sys_write+0x19/0x20 [ 4475.716849] ? do_syscall_64+0x69/0xc0 [ 4475.716851] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716855] ? __x64_sys_write+0x19/0x20 [ 4475.716857] ? do_syscall_64+0x69/0xc0 [ 4475.716859] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4475.716862] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716865] ? __x64_sys_write+0x19/0x20 [ 4475.716867] ? do_syscall_64+0x69/0xc0 [ 4475.716869] ? syscall_exit_to_user_mode+0x27/0x50 [ 4475.716873] ? __x64_sys_write+0x19/0x20 [ 4475.716875] ? do_syscall_64+0x69/0xc0 [ 4475.716877] ? do_syscall_64+0x69/0xc0 [ 4475.716879] ? asm_common_interrupt+0x8/0x40 [ 4475.716883] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4475.716885] RIP: 0033:0x7fd3c498aa5b [ 4475.716888] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 4475.716890] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b [ 4475.716892] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006 [ 4475.716893] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40 [ 4475.716895] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000 [ 4475.716896] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000 [ 4475.716898] </TASK> [ 4475.716912] INFO: task kworker/26:87:16682 blocked for more than 122 seconds. [ 4476.063391] Not tainted 5.15.50-051550-generic #202206251445 [ 4476.363476] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4476.743988] task:kworker/26:87 state:D stack: 0 pid:16682 ppid: 2 flags:0x00004000 [ 4476.743994] Workqueue: events update_writeback_rate [bcache] [ 4476.744022] Call Trace: [ 4476.744024] <TASK> [ 4476.744026] __schedule+0x23d/0x5a0 [ 4476.744032] schedule+0x4e/0xb0 [ 4476.744035] rwsem_down_read_slowpath+0x32e/0x380 [ 4476.744039] down_read+0x43/0x90 [ 4476.744042] update_writeback_rate+0x134/0x190 [bcache] [ 4476.744057] process_one_work+0x22b/0x3d0 [ 4476.744063] worker_thread+0x53/0x410 [ 4476.744066] ? process_one_work+0x3d0/0x3d0 [ 4476.744070] kthread+0x12a/0x150 [ 4476.744074] ? set_kthread_struct+0x50/0x50 [ 4476.744077] ret_from_fork+0x22/0x30 [ 4476.744082] </TASK> [ 4598.626220] INFO: task bcache_writebac:1835 blocked for more than 245 seconds. [ 4598.978432] Not tainted 5.15.50-051550-generic #202206251445 [ 4599.279900] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4599.657293] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4599.657298] Call Trace: [ 4599.657300] <TASK> [ 4599.657302] __schedule+0x23d/0x5a0 [ 4599.657325] schedule+0x4e/0xb0 [ 4599.657328] rwsem_down_write_slowpath+0x220/0x3d0 [ 4599.657333] down_write+0x43/0x50 [ 4599.657338] bch_writeback_thread+0x78/0x320 [bcache] [ 4599.657374] ? read_dirty_submit+0x70/0x70 [bcache] [ 4599.657390] kthread+0x12a/0x150 [ 4599.657393] ? set_kthread_struct+0x50/0x50 [ 4599.657396] ret_from_fork+0x22/0x30 [ 4599.657401] </TASK> [ 4599.657434] INFO: task kworker/2:63:14895 blocked for more than 124 seconds. [ 4600.000498] Not tainted 5.15.50-051550-generic #202206251445 [ 4600.301188] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4600.681984] task:kworker/2:63 state:D stack: 0 pid:14895 ppid: 2 flags:0x00004000 [ 4600.682005] Workqueue: bcache bch_data_insert_start [bcache] [ 4600.682031] Call Trace: [ 4600.682033] <TASK> [ 4600.682035] __schedule+0x23d/0x5a0 [ 4600.682040] schedule+0x4e/0xb0 [ 4600.682043] bch_bucket_alloc+0x21a/0x5b0 [bcache] [ 4600.682066] ? wait_woken+0x70/0x70 [ 4600.682071] __bch_bucket_alloc_set+0x52/0xf0 [bcache] [ 4600.682082] bch_alloc_sectors+0x1c9/0x4c0 [bcache] [ 4600.682098] ? krealloc+0x9d/0xd0 [ 4600.682102] ? __bch_keylist_realloc+0xb7/0x100 [bcache] [ 4600.682113] ? __bch_submit_bbio+0x97/0xb0 [bcache] [ 4600.682126] bch_data_insert_start+0x15e/0x3a0 [bcache] [ 4600.682139] ? closure_sub+0x94/0xb0 [bcache] [ 4600.682152] process_one_work+0x22b/0x3d0 [ 4600.682157] worker_thread+0x53/0x410 [ 4600.682161] ? process_one_work+0x3d0/0x3d0 [ 4600.682164] kthread+0x12a/0x150 [ 4600.682168] ? set_kthread_struct+0x50/0x50 [ 4600.682171] ret_from_fork+0x22/0x30 [ 4600.682176] </TASK> [ 4600.682240] INFO: task kworker/25:123:16464 blocked for more than 125 seconds. [ 4601.033116] Not tainted 5.15.50-051550-generic #202206251445 [ 4601.333405] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4601.713829] task:kworker/25:123 state:D stack: 0 pid:16464 ppid: 2 flags:0x00004000 [ 4601.713835] Workqueue: xfs-sync/bcache0 xfs_log_worker [xfs] [ 4601.713975] Call Trace: [ 4601.713977] <TASK> [ 4601.713979] __schedule+0x23d/0x5a0 [ 4601.713986] schedule+0x4e/0xb0 [ 4601.713988] rwsem_down_read_slowpath+0x32e/0x380 [ 4601.713993] down_read+0x43/0x90 [ 4601.713996] cached_dev_write+0x7e/0x480 [bcache] [ 4601.714018] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4601.714031] __submit_bio+0x1a1/0x220 [ 4601.714035] ? ttwu_do_activate+0x72/0xf0 [ 4601.714039] __submit_bio_noacct+0x85/0x1f0 [ 4601.714042] ? mutex_lock+0x13/0x40 [ 4601.714045] submit_bio_noacct+0x4e/0x120 [ 4601.714047] submit_bio+0x4a/0x130 [ 4601.714050] xlog_write_iclog+0x254/0x300 [xfs] [ 4601.714142] xlog_sync+0x1ab/0x2c0 [xfs] [ 4601.714225] xlog_state_release_iclog+0x123/0x1d0 [xfs] [ 4601.714306] xfs_log_force+0x186/0x250 [xfs] [ 4601.714386] xfs_log_worker+0x39/0x90 [xfs] [ 4601.714466] process_one_work+0x22b/0x3d0 [ 4601.714471] worker_thread+0x53/0x410 [ 4601.714474] ? process_one_work+0x3d0/0x3d0 [ 4601.714477] kthread+0x12a/0x150 [ 4601.714481] ? set_kthread_struct+0x50/0x50 [ 4601.714484] ret_from_fork+0x22/0x30 [ 4601.714489] </TASK> [ 4601.714526] INFO: task kworker/22:89:16557 blocked for more than 126 seconds. [ 4602.056844] Not tainted 5.15.50-051550-generic #202206251445 [ 4602.357148] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4602.733923] task:kworker/22:89 state:D stack: 0 pid:16557 ppid: 2 flags:0x00004000 [ 4602.733940] Workqueue: bcache bch_data_insert_start [bcache] [ 4602.733966] Call Trace: [ 4602.733968] <TASK> [ 4602.733970] __schedule+0x23d/0x5a0 [ 4602.733988] schedule+0x4e/0xb0 [ 4602.733991] bch_bucket_alloc+0x21a/0x5b0 [bcache] [ 4602.734002] ? bch_btree_insert_check_key+0x1e0/0x1e0 [bcache] [ 4602.734014] ? wait_woken+0x70/0x70 [ 4602.734019] __bch_bucket_alloc_set+0x52/0xf0 [bcache] [ 4602.734031] bch_alloc_sectors+0x1c9/0x4c0 [bcache] [ 4602.734043] ? __wake_up_common_lock+0x8a/0xc0 [ 4602.734046] bch_data_insert_start+0x15e/0x3a0 [bcache] [ 4602.734062] ? closure_sub+0x94/0xb0 [bcache] [ 4602.734083] process_one_work+0x22b/0x3d0 [ 4602.734088] worker_thread+0x53/0x410 [ 4602.734092] ? process_one_work+0x3d0/0x3d0 [ 4602.734095] kthread+0x12a/0x150 [ 4602.734099] ? set_kthread_struct+0x50/0x50 [ 4602.734102] ret_from_fork+0x22/0x30 [ 4602.734108] </TASK> [ 4602.734134] INFO: task fio:16626 blocked for more than 249 seconds. [ 4603.034541] Not tainted 5.15.50-051550-generic #202206251445 [ 4603.330400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4603.711013] task:fio state:D stack: 0 pid:16626 ppid: 1715 flags:0x00004002 [ 4603.711030] Call Trace: [ 4603.711032] <TASK> [ 4603.711046] __schedule+0x23d/0x5a0 [ 4603.711054] ? sysvec_apic_timer_interrupt+0x4e/0x90 [ 4603.711060] schedule+0x4e/0xb0 [ 4603.711063] rwsem_down_read_slowpath+0x32e/0x380 [ 4603.711067] down_read+0x43/0x90 [ 4603.711072] cached_dev_write+0x7e/0x480 [bcache] [ 4603.711095] cached_dev_submit_bio+0x502/0x550 [bcache] [ 4603.711108] __submit_bio+0x1a1/0x220 [ 4603.711113] __submit_bio_noacct+0x85/0x1f0 [ 4603.711116] submit_bio_noacct+0x4e/0x120 [ 4603.711118] submit_bio+0x4a/0x130 [ 4603.711121] iomap_submit_ioend+0x53/0x80 [ 4603.711124] iomap_writepages+0x35/0x40 [ 4603.711126] xfs_vm_writepages+0x84/0xb0 [xfs] [ 4603.711223] do_writepages+0xda/0x200 [ 4603.711229] filemap_fdatawrite_wbc+0x81/0xd0 [ 4603.711232] file_write_and_wait_range+0xac/0xf0 [ 4603.711235] xfs_file_fsync+0x5b/0x250 [xfs] [ 4603.711312] vfs_fsync_range+0x49/0x80 [ 4603.711317] ? __fget_light+0x32/0x80 [ 4603.711321] __x64_sys_fsync+0x38/0x60 [ 4603.711324] do_syscall_64+0x5c/0xc0 [ 4603.711327] ? ksys_write+0xce/0xe0 [ 4603.711330] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4603.711335] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711338] ? __x64_sys_write+0x19/0x20 [ 4603.711340] ? do_syscall_64+0x69/0xc0 [ 4603.711342] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711345] ? __x64_sys_write+0x19/0x20 [ 4603.711347] ? do_syscall_64+0x69/0xc0 [ 4603.711349] ? exit_to_user_mode_prepare+0x37/0xb0 [ 4603.711352] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711355] ? __x64_sys_write+0x19/0x20 [ 4603.711357] ? do_syscall_64+0x69/0xc0 [ 4603.711359] ? syscall_exit_to_user_mode+0x27/0x50 [ 4603.711362] ? __x64_sys_write+0x19/0x20 [ 4603.711364] ? do_syscall_64+0x69/0xc0 [ 4603.711366] ? do_syscall_64+0x69/0xc0 [ 4603.711368] ? asm_common_interrupt+0x8/0x40 [ 4603.711371] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 4603.711374] RIP: 0033:0x7fd3c498aa5b [ 4603.711376] RSP: 002b:00007ffc81602ee0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [ 4603.711379] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd3c498aa5b [ 4603.711381] RDX: 0000000000000000 RSI: 000055ededf41f40 RDI: 0000000000000006 [ 4603.711382] RBP: 0000000000001000 R08: 0000000000000000 R09: 000055ededf41f40 [ 4603.711383] R10: 00000000e8746551 R11: 0000000000000293 R12: 00007fd3ba533000 [ 4603.711384] R13: 0000000000000000 R14: 00007fd3c1cde110 R15: 0000000300000000 [ 4603.711387] </TASK> [ 4603.711421] INFO: task kworker/26:87:16682 blocked for more than 250 seconds. [ 4604.058441] Not tainted 5.15.50-051550-generic #202206251445 [ 4604.359128] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4604.739950] task:kworker/26:87 state:D stack: 0 pid:16682 ppid: 2 flags:0x00004000 [ 4604.739968] Workqueue: events update_writeback_rate [bcache] [ 4604.739995] Call Trace: [ 4604.739997] <TASK> [ 4604.740012] __schedule+0x23d/0x5a0 [ 4604.740017] schedule+0x4e/0xb0 [ 4604.740020] rwsem_down_read_slowpath+0x32e/0x380 [ 4604.740024] down_read+0x43/0x90 [ 4604.740027] update_writeback_rate+0x134/0x190 [bcache] [ 4604.740043] process_one_work+0x22b/0x3d0 [ 4604.740049] worker_thread+0x53/0x410 [ 4604.740052] ? process_one_work+0x3d0/0x3d0 [ 4604.740055] kthread+0x12a/0x150 [ 4604.740059] ? set_kthread_struct+0x50/0x50 [ 4604.740062] ret_from_fork+0x22/0x30 [ 4604.740067] </TASK> [ 4725.600602] INFO: task bcache_writebac:1835 blocked for more than 372 seconds. [ 4725.952827] Not tainted 5.15.50-051550-generic #202206251445 [ 4726.253763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4726.630937] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4726.630954] Call Trace: [ 4726.630956] <TASK> [ 4726.630959] __schedule+0x23d/0x5a0 [ 4726.630966] schedule+0x4e/0xb0 [ 4726.630970] rwsem_down_write_slowpath+0x220/0x3d0 [ 4726.630976] down_write+0x43/0x50 [ 4726.630980] bch_writeback_thread+0x78/0x320 [bcache] [ 4726.631005] ? read_dirty_submit+0x70/0x70 [bcache] [ 4726.631021] kthread+0x12a/0x150 [ 4726.631025] ? set_kthread_struct+0x50/0x50 [ 4726.631028] ret_from_fork+0x22/0x30 [ 4726.631033] </TASK> [Fix] commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800     bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init() I have built this fix into kernel 5.15.50 (https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.15/ ) and tested to verify the problem is fixed. root@nikhilbionic:/home/ubuntu/nikhil/5_15_50/cod/mainline/v5.15.50# git log commit b57a534df246999bc09a510dc0e0125cacd1724c (HEAD) Author: Coly Li <colyli@suse.de> Date: Fri May 27 23:28:16 2022 +0800     bcache: memset on stack variables in bch_btree_check() and bch_sectors_dirty_init()     The local variables check_state (in bch_btree_check()) and state (in     bch_sectors_dirty_init()) should be fully filled by 0, because before     allocating them on stack, they were dynamically allocated by kzalloc().     Signed-off-by: Coly Li <colyli@suse.de>     Link: https://lore.kernel.org/r/20220527152818.27545-2-colyli@suse.de     Signed-off-by: Jens Axboe <axboe@kernel.dk> commit 18a33c8dabb88b50b860e0177a73933f2c0ddf68 (tag: v5.15.50) Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Date: Sat Jun 25 15:18:40 2022 +0200     Linux 5.15.50     Link: https://lore.kernel.org/r/20220623164322.288837280@linuxfoundation.org     Tested-by: Florian Fainelli <f.fainelli@gmail.com>     Tested-by: Shuah Khan <skhan@linuxfoundation.org>     Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>     Tested-by: Jon Hunter <jonathanh@nvidia.com>     Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>     Tested-by: Ron Economos <re@w6rz.net>     Tested-by: Guenter Roeck <linux@roeck-us.net>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [Regression Potential] I have not seen any potential drawbacks or harmful effects of this fix in my testing. In fact it is required, without which the deadlock is easily reproduced both on focal as well as jammy GA. SRU Justification: [Impact] This bug in bcache affects (at least) focal and jammy releases. When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> [Fix] These 3 fixes are needed for the SRU. dea3560e5f31965165bcf34ecf0b47af28bfd155, 6445ec3df23f24677064a327dce437ef3e02dc6, dc60301fb408e06e0b718c0980cdd31d2b238bee I have built these fixes into kernel 5.15.0-39-generic (jammy) and tested to verify the problem is fixed. [Regression Potential] I have not seen any potential drawbacks or harmful effects of this fix in my testing. In fact it is required, without which the deadlock is easily reproduced both on focal as well as jammy GA.
2022-07-29 09:36:52 nikhil kshirsagar description SRU Justification: [Impact] This bug in bcache affects (at least) focal and jammy releases. When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> [Fix] These 3 fixes are needed for the SRU. dea3560e5f31965165bcf34ecf0b47af28bfd155, 6445ec3df23f24677064a327dce437ef3e02dc6, dc60301fb408e06e0b718c0980cdd31d2b238bee I have built these fixes into kernel 5.15.0-39-generic (jammy) and tested to verify the problem is fixed. [Regression Potential] I have not seen any potential drawbacks or harmful effects of this fix in my testing. In fact it is required, without which the deadlock is easily reproduced both on focal as well as jammy GA. SRU Justification: [Impact] This bug in bcache affects (at least) focal and jammy releases. When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> [Fix] These 3 fixes are needed for the SRU. dea3560e5f31965165bcf34ecf0b47af28bfd155, 6445ec3df23f24677064a327dce437ef3e02dc6, dc60301fb408e06e0b718c0980cdd31d2b238bee I have built these fixes into kernel 5.15.0-39-generic (jammy) and tested to verify the problem is fixed. [Regression Potential] Regression potential should be minimal. I have not seen any potential drawbacks or harmful effects of this fix in my testing.
2022-07-29 09:39:28 nikhil kshirsagar description SRU Justification: [Impact] This bug in bcache affects (at least) focal and jammy releases. When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> [Fix] These 3 fixes are needed for the SRU. dea3560e5f31965165bcf34ecf0b47af28bfd155, 6445ec3df23f24677064a327dce437ef3e02dc6, dc60301fb408e06e0b718c0980cdd31d2b238bee I have built these fixes into kernel 5.15.0-39-generic (jammy) and tested to verify the problem is fixed. [Regression Potential] Regression potential should be minimal. I have not seen any potential drawbacks or harmful effects of this fix in my testing. SRU Justification: [Impact] When Random Read I/O is started with a test like - fio --name=read_iops --directory=/home/ubuntu/bcache_mount/ --size=16G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --randrepeat=0 or random read-writes with a test like, fio --filename=/home/ubuntu/bcache_mount/cachedfile --size=15GB --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=128 --name=iops-test-job --randrepeat=0 traces are seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> The bug exists till kernel 5.15.50-051550-generic The reproducer is pasted below: # uname -a Linux bronzor 5.15.50-051550-generic #202206251445 SMP Sat Jun 25 14:51:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sdd 8:48 0 279.4G 0 disk └─sdd1 8:49 0 60G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount nvme0n1 259:0 0 372.6G 0 disk └─nvme0n1p1 259:2 0 15G 0 part   └─bcache0 252:0 0 60G 0 disk /home/ubuntu/bcache_mount fio --name=read_iops --directory=/home/ubuntu/bcache_mount --size=12G --ioengine=libaio --direct=1 --verify=0 --bs=4K --iodepth=128 --rw=randread --group_reporting=1 read_iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128 fio-3.28 Starting 1 process read_iops: Laying out IO file (1 file / 12288MiB) The test does not progress beyond a few minutes, and this trace is then seen in the kernel log, [ 4473.699902] INFO: task bcache_writebac:1835 blocked for more than 120 seconds. [ 4474.050921] Not tainted 5.15.50-051550-generic #202206251445 [ 4474.350883] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4474.731391] task:bcache_writebac state:D stack: 0 pid: 1835 ppid: 2 flags:0x00004000 [ 4474.731408] Call Trace: [ 4474.731411] <TASK> [ 4474.731413] __schedule+0x23d/0x5a0 [ 4474.731433] schedule+0x4e/0xb0 [ 4474.731436] rwsem_down_write_slowpath+0x220/0x3d0 [ 4474.731441] down_write+0x43/0x50 [ 4474.731446] bch_writeback_thread+0x78/0x320 [bcache] [ 4474.731471] ? read_dirty_submit+0x70/0x70 [bcache] [ 4474.731487] kthread+0x12a/0x150 [ 4474.731491] ? set_kthread_struct+0x50/0x50 [ 4474.731494] ret_from_fork+0x22/0x30 [ 4474.731499] </TASK> [Fix] These 3 fixes are needed for the SRU. dea3560e5f31965165bcf34ecf0b47af28bfd155, 6445ec3df23f24677064a327dce437ef3e02dc6, dc60301fb408e06e0b718c0980cdd31d2b238bee I have built these fixes into kernel 5.15.0-39-generic (jammy) and tested to verify the problem is fixed. [Regression Potential] Regression potential should be minimal. I have not seen any potential drawbacks or harmful effects of this fix in my testing.
2022-08-01 07:41:32 Stefan Bader linux (Ubuntu Jammy): importance Undecided Medium
2022-08-01 07:41:36 Stefan Bader linux (Ubuntu Focal): importance Undecided Medium
2022-08-01 08:15:49 Stefan Bader linux (Ubuntu): status Confirmed Invalid
2022-08-01 08:16:28 Stefan Bader linux (Ubuntu Focal): status Confirmed Invalid
2022-08-02 12:31:18 Stefan Bader linux (Ubuntu Jammy): status Confirmed Fix Committed
2022-08-31 08:38:25 Launchpad Janitor linux (Ubuntu Jammy): status Fix Committed Fix Released
2022-08-31 08:38:25 Launchpad Janitor cve linked 2021-33061
2022-10-02 01:57:31 nikhil kshirsagar tags verification-done-jammy
2024-03-01 06:16:52 Ubuntu Kernel Bot tags verification-done-jammy kernel-spammed-jammy-linux-mtk-v2 verification-done-jammy verification-needed-jammy-linux-mtk