Activity log for bug #1810998

Date Who What changed Old value New value Message
2019-01-08 20:52:01 Mauricio Faria de Oliveira bug added bug
2019-01-08 20:52:28 Mauricio Faria de Oliveira linux (Ubuntu): assignee Mauricio Faria de Oliveira (mfo)
2019-01-08 21:00:06 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2019-01-08 21:00:07 Ubuntu Kernel Bot tags bionic
2019-01-08 21:05:30 Mauricio Faria de Oliveira description [NOTE] * Patches will be sent to the kernel-team mailing list once the test kernel has been verified by the reporter. [Impact] * Users may experience cpu hard lockups when performing rigorous writes to NVMe drives. * The fix addresses an scheduling issue in the original implementation of wbt/writeback throttling * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait"), plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve waking of tasks"). * There are additional commits to help with a cleaner backport and future maintenance. So, of 13 commits, 9 are clean cherry-picks and 4 backports, which are just changes to context lines (i.e. refresh) without any functional changes in the backport itself. [Test Case] * This command has been reported to reproduce the problem: $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128 * It generates stack traces as included below. [Regression Potential] * The commits have been verified for fixes in later commits in linux-next as of 2019-01-08 and all known fix commits are in. * The regression potential is mostly contained in the writeback throttling (block/blk-wbt.*), as almost all of the 13 patches change exclusively that, except for 4 of them (2 of which are sysfs): - blk-rq-qos: refactor out common elements of blk-wbt (block/) - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-sysfs.c) - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-{mq-}sysfs.c) - block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*) [Other Info] * Alternatively, it is probably possible to introduce just the two commits that fix this with some changes to their code in the backport, but since the 'blk-rq-qos: refactor ..' commit may become a dependency for many additional/future fixes, it seemed interesting to pull it in earlier in the 18.04 branch. * The problem has been introduced with the blk-wbt mechanism, in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2, so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ... [NOTE]  * Patches will be sent to the kernel-team mailing list    once the test kernel has been verified by the reporter. [Impact]  * Users may experience cpu hard lockups when performing    rigorous writes to NVMe drives.  * The fix addresses an scheduling issue in the original    implementation of wbt/writeback throttling  * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock    contention and thundering herd issue in wbt_wait"),    plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve    waking of tasks").  * There are additional commits to help with a cleaner    backport and future maintenance: - Cosmic: 8 clean cherry picks. - Bionic: So, of 13 commits, 9 are clean cherry-picks and 4 backports, which are just changes to context lines (i.e. refresh) without any functional changes in the backport itself. [Test Case]  * This command has been reported to reproduce the problem:    $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128  * It generates stack traces as included below. [Regression Potential]  * The commits have been verified for fixes in later commits in    linux-next as of 2019-01-08 and all known fix commits are in.  * The regression potential is mostly contained in the writeback    throttling (block/blk-wbt.*), as almost all of the 13 patches    change exclusively that, except for 4 of them (2 of which are    sysfs):    - blk-rq-qos: refactor out common elements of blk-wbt (block/)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-sysfs.c)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-{mq-}sysfs.c)    - block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*) [Other Info]  * Alternatively, it is probably possible to introduce just the    two commits that fix this with some changes to their code in    the backport, but since the 'blk-rq-qos: refactor ..' commit    may become a dependency for many additional/future fixes, it    seemed interesting to pull it in earlier in the 18.04 branch.  * The problem has been introduced with the blk-wbt mechanism,    in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,    so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ...
2019-01-08 21:05:55 Mauricio Faria de Oliveira description [NOTE]  * Patches will be sent to the kernel-team mailing list    once the test kernel has been verified by the reporter. [Impact]  * Users may experience cpu hard lockups when performing    rigorous writes to NVMe drives.  * The fix addresses an scheduling issue in the original    implementation of wbt/writeback throttling  * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock    contention and thundering herd issue in wbt_wait"),    plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve    waking of tasks").  * There are additional commits to help with a cleaner    backport and future maintenance: - Cosmic: 8 clean cherry picks. - Bionic: So, of 13 commits, 9 are clean cherry-picks and 4 backports, which are just changes to context lines (i.e. refresh) without any functional changes in the backport itself. [Test Case]  * This command has been reported to reproduce the problem:    $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128  * It generates stack traces as included below. [Regression Potential]  * The commits have been verified for fixes in later commits in    linux-next as of 2019-01-08 and all known fix commits are in.  * The regression potential is mostly contained in the writeback    throttling (block/blk-wbt.*), as almost all of the 13 patches    change exclusively that, except for 4 of them (2 of which are    sysfs):    - blk-rq-qos: refactor out common elements of blk-wbt (block/)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-sysfs.c)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-{mq-}sysfs.c)    - block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*) [Other Info]  * Alternatively, it is probably possible to introduce just the    two commits that fix this with some changes to their code in    the backport, but since the 'blk-rq-qos: refactor ..' commit    may become a dependency for many additional/future fixes, it    seemed interesting to pull it in earlier in the 18.04 branch.  * The problem has been introduced with the blk-wbt mechanism,    in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,    so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ... [NOTE]  * Patches will be sent to the kernel-team mailing list    once the test kernel has been verified by the reporter. [Impact]  * Users may experience cpu hard lockups when performing    rigorous writes to NVMe drives.  * The fix addresses an scheduling issue in the original    implementation of wbt/writeback throttling  * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock    contention and thundering herd issue in wbt_wait"),    plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve    waking of tasks").  * There are additional commits to help with a cleaner    backport and future maintenance:    - Cosmic: total of 8 clean cherry picks.    - Bionic: So, of 13 commits, 9 are clean cherry-picks      and 4 backports, which are just changes to context      lines (i.e. refresh) without any functional changes      in the backport itself. [Test Case]  * This command has been reported to reproduce the problem:    $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128  * It generates stack traces as included below. [Regression Potential]  * The commits have been verified for fixes in later commits in    linux-next as of 2019-01-08 and all known fix commits are in.  * The regression potential is mostly contained in the writeback    throttling (block/blk-wbt.*), as almost all of the 13 patches    change exclusively that, except for 4 of them (2 of which are    sysfs):    - blk-rq-qos: refactor out common elements of blk-wbt (block/)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-sysfs.c)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-{mq-}sysfs.c)    - block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*) [Other Info]  * Alternatively, it is probably possible to introduce just the    two commits that fix this with some changes to their code in    the backport, but since the 'blk-rq-qos: refactor ..' commit    may become a dependency for many additional/future fixes, it    seemed interesting to pull it in earlier in the 18.04 branch.  * The problem has been introduced with the blk-wbt mechanism,    in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,    so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ...
2019-01-08 21:06:17 Mauricio Faria de Oliveira linux (Ubuntu): status Incomplete Confirmed
2019-01-10 02:17:03 Mauricio Faria de Oliveira description [NOTE]  * Patches will be sent to the kernel-team mailing list    once the test kernel has been verified by the reporter. [Impact]  * Users may experience cpu hard lockups when performing    rigorous writes to NVMe drives.  * The fix addresses an scheduling issue in the original    implementation of wbt/writeback throttling  * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock    contention and thundering herd issue in wbt_wait"),    plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve    waking of tasks").  * There are additional commits to help with a cleaner    backport and future maintenance:    - Cosmic: total of 8 clean cherry picks.    - Bionic: So, of 13 commits, 9 are clean cherry-picks      and 4 backports, which are just changes to context      lines (i.e. refresh) without any functional changes      in the backport itself. [Test Case]  * This command has been reported to reproduce the problem:    $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128  * It generates stack traces as included below. [Regression Potential]  * The commits have been verified for fixes in later commits in    linux-next as of 2019-01-08 and all known fix commits are in.  * The regression potential is mostly contained in the writeback    throttling (block/blk-wbt.*), as almost all of the 13 patches    change exclusively that, except for 4 of them (2 of which are    sysfs):    - blk-rq-qos: refactor out common elements of blk-wbt (block/)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-sysfs.c)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-{mq-}sysfs.c)    - block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*) [Other Info]  * Alternatively, it is probably possible to introduce just the    two commits that fix this with some changes to their code in    the backport, but since the 'blk-rq-qos: refactor ..' commit    may become a dependency for many additional/future fixes, it    seemed interesting to pull it in earlier in the 18.04 branch.  * The problem has been introduced with the blk-wbt mechanism,    in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,    so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ... [NOTE]  * Patches will be sent to the kernel-team mailing list    once the test kernel has been verified by the reporter. [Impact]  * Users may experience cpu hard lockups when performing    rigorous writes to NVMe drives.  * The fix addresses an scheduling issue in the original    implementation of wbt/writeback throttling  * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock    contention and thundering herd issue in wbt_wait"),    plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve    waking of tasks").  * There are additional commits to help with a cleaner    backport and future maintenance:    - Cosmic: total of 8 clean cherry picks.    - Bionic: total of 13 commits, 9 clean cherry-picks      and 4 backports, which are just changes to context      lines (i.e. refresh) without any functional changes      in the backport itself. [Test Case]  * This command has been reported to reproduce the problem:    $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128  * It generates stack traces as included below. [Regression Potential]  * The commits have been verified for fixes in later commits in    linux-next as of 2019-01-08 and all known fix commits are in.  * The regression potential is mostly contained in the writeback    throttling (block/blk-wbt.*), as almost all of the 13 patches    change exclusively that, except for 4 of them (2 of which are    sysfs):    - blk-rq-qos: refactor out common elements of blk-wbt (block/)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-sysfs.c)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-{mq-}sysfs.c)    - block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*) [Other Info]  * Alternatively, it is probably possible to introduce just the    two commits that fix this with some changes to their code in    the backport, but since the 'blk-rq-qos: refactor ..' commit    may become a dependency for many additional/future fixes, it    seemed interesting to pull it in earlier in the 18.04 branch.  * The problem has been introduced with the blk-wbt mechanism,    in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,    so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ...
2019-01-10 02:17:50 Mauricio Faria de Oliveira description [NOTE]  * Patches will be sent to the kernel-team mailing list    once the test kernel has been verified by the reporter. [Impact]  * Users may experience cpu hard lockups when performing    rigorous writes to NVMe drives.  * The fix addresses an scheduling issue in the original    implementation of wbt/writeback throttling  * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock    contention and thundering herd issue in wbt_wait"),    plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve    waking of tasks").  * There are additional commits to help with a cleaner    backport and future maintenance:    - Cosmic: total of 8 clean cherry picks.    - Bionic: total of 13 commits, 9 clean cherry-picks      and 4 backports, which are just changes to context      lines (i.e. refresh) without any functional changes      in the backport itself. [Test Case]  * This command has been reported to reproduce the problem:    $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128  * It generates stack traces as included below. [Regression Potential]  * The commits have been verified for fixes in later commits in    linux-next as of 2019-01-08 and all known fix commits are in.  * The regression potential is mostly contained in the writeback    throttling (block/blk-wbt.*), as almost all of the 13 patches    change exclusively that, except for 4 of them (2 of which are    sysfs):    - blk-rq-qos: refactor out common elements of blk-wbt (block/)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-sysfs.c)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-{mq-}sysfs.c)    - block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*) [Other Info]  * Alternatively, it is probably possible to introduce just the    two commits that fix this with some changes to their code in    the backport, but since the 'blk-rq-qos: refactor ..' commit    may become a dependency for many additional/future fixes, it    seemed interesting to pull it in earlier in the 18.04 branch.  * The problem has been introduced with the blk-wbt mechanism,    in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,    so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ... [NOTE]  * Patches will be sent to the kernel-team mailing list    once the test kernel has been verified by the reporter. [Impact]  * Users may experience cpu hard lockups when performing    rigorous writes to NVMe drives.  * The fix addresses an scheduling issue in the original    implementation of wbt/writeback throttling  * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock    contention and thundering herd issue in wbt_wait"),    plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve    waking of tasks").  * There are additional commits to help with a cleaner    backport and future maintenance:    - Cosmic: total of 8 clean cherry picks.    - Bionic: total of 13 commits, 9 clean cherry-picks      and 4 backports, which are just changes to context      lines (i.e. refresh) without any functional changes      in the backport itself. [Test Case]  * This command has been reported to reproduce the problem:    $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128  * It generates stack traces as included below. [Regression Potential]  * The commits have been verified for fixes in later commits in    linux-next as of 2019-01-08 and all known fix commits are in.  * The regression potential is mostly contained in the writeback    throttling (block/blk-wbt.*), as almost all of the 13 patches    change exclusively that, except for 4 of them (2 of which are    sysfs):    - blk-rq-qos: refactor out common elements of blk-wbt (block/)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-sysfs.c)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-{mq-}sysfs.c)    - block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*) [Other Info]  * Alternatively, it is probably possible to introduce just the    two commits that fix this with some changes to their code in    the backport, but since the 'blk-rq-qos: refactor ..' commit    may become a dependency for many additional/future fixes, it    seemed interesting to pull it in earlier in the 18.04 cycle.  * The problem has been introduced with the blk-wbt mechanism,    in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,    so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ...
2019-01-10 02:35:41 Mauricio Faria de Oliveira description [NOTE]  * Patches will be sent to the kernel-team mailing list    once the test kernel has been verified by the reporter. [Impact]  * Users may experience cpu hard lockups when performing    rigorous writes to NVMe drives.  * The fix addresses an scheduling issue in the original    implementation of wbt/writeback throttling  * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock    contention and thundering herd issue in wbt_wait"),    plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve    waking of tasks").  * There are additional commits to help with a cleaner    backport and future maintenance:    - Cosmic: total of 8 clean cherry picks.    - Bionic: total of 13 commits, 9 clean cherry-picks      and 4 backports, which are just changes to context      lines (i.e. refresh) without any functional changes      in the backport itself. [Test Case]  * This command has been reported to reproduce the problem:    $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128  * It generates stack traces as included below. [Regression Potential]  * The commits have been verified for fixes in later commits in    linux-next as of 2019-01-08 and all known fix commits are in.  * The regression potential is mostly contained in the writeback    throttling (block/blk-wbt.*), as almost all of the 13 patches    change exclusively that, except for 4 of them (2 of which are    sysfs):    - blk-rq-qos: refactor out common elements of blk-wbt (block/)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-sysfs.c)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-{mq-}sysfs.c)    - block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*) [Other Info]  * Alternatively, it is probably possible to introduce just the    two commits that fix this with some changes to their code in    the backport, but since the 'blk-rq-qos: refactor ..' commit    may become a dependency for many additional/future fixes, it    seemed interesting to pull it in earlier in the 18.04 cycle.  * The problem has been introduced with the blk-wbt mechanism,    in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,    so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ... [NOTE]  * Patches will be sent to the kernel-team mailing list    once the test kernel has been verified by the reporter. [Impact]  * Users may experience cpu hard lockups when performing    rigorous writes to NVMe drives.  * The fix addresses an scheduling issue in the original    implementation of wbt/writeback throttling  * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock    contention and thundering herd issue in wbt_wait"),    plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve    waking of tasks").  * There are additional commits to help with a cleaner    backport and future maintenance:    - Cosmic: total of 8 clean cherry picks.    - Bionic: total of 13 commits, 9 clean cherry-picks      and 4 backports, which are just changes to context      lines (i.e. refresh) without any functional changes      in the backport itself. [Test Case]  * This command has been reported to reproduce the problem:    $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128  * It generates stack traces as below in the original kernel, and does not generate them in the modified/patched kernel. * The user/reporter verified the test kernel with these patches resolved the problem. * The developer verified in 2 systems (4-core and 24-core but no NVMe) for regressions, and no error messages were logged to dmesg. [Regression Potential]  * The commits have been verified for fixes in later commits in    linux-next as of 2019-01-08 and all known fix commits are in.  * The regression potential is mostly contained in the writeback    throttling (block/blk-wbt.*), as almost all of the 13 patches    change exclusively that, except for 4 of them (2 of which are    sysfs):    - blk-rq-qos: refactor out common elements of blk-wbt (block/)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-sysfs.c)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-{mq-}sysfs.c)    - block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*) [Other Info]  * Alternatively, it is probably possible to introduce just the    two commits that fix this with some changes to their code in    the backport, but since the 'blk-rq-qos: refactor ..' commit    may become a dependency for many additional/future fixes, it    seemed interesting to pull it in earlier in the 18.04 cycle.  * The problem has been introduced with the blk-wbt mechanism,    in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,    so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ...
2019-01-10 03:06:54 Mauricio Faria de Oliveira description [NOTE]  * Patches will be sent to the kernel-team mailing list    once the test kernel has been verified by the reporter. [Impact]  * Users may experience cpu hard lockups when performing    rigorous writes to NVMe drives.  * The fix addresses an scheduling issue in the original    implementation of wbt/writeback throttling  * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock    contention and thundering herd issue in wbt_wait"),    plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve    waking of tasks").  * There are additional commits to help with a cleaner    backport and future maintenance:    - Cosmic: total of 8 clean cherry picks.    - Bionic: total of 13 commits, 9 clean cherry-picks      and 4 backports, which are just changes to context      lines (i.e. refresh) without any functional changes      in the backport itself. [Test Case]  * This command has been reported to reproduce the problem:    $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128  * It generates stack traces as below in the original kernel, and does not generate them in the modified/patched kernel. * The user/reporter verified the test kernel with these patches resolved the problem. * The developer verified in 2 systems (4-core and 24-core but no NVMe) for regressions, and no error messages were logged to dmesg. [Regression Potential]  * The commits have been verified for fixes in later commits in    linux-next as of 2019-01-08 and all known fix commits are in.  * The regression potential is mostly contained in the writeback    throttling (block/blk-wbt.*), as almost all of the 13 patches    change exclusively that, except for 4 of them (2 of which are    sysfs):    - blk-rq-qos: refactor out common elements of blk-wbt (block/)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-sysfs.c)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-{mq-}sysfs.c)    - block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*) [Other Info]  * Alternatively, it is probably possible to introduce just the    two commits that fix this with some changes to their code in    the backport, but since the 'blk-rq-qos: refactor ..' commit    may become a dependency for many additional/future fixes, it    seemed interesting to pull it in earlier in the 18.04 cycle.  * The problem has been introduced with the blk-wbt mechanism,    in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,    so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ... [Impact]  * Users may experience cpu hard lockups when performing    rigorous writes to NVMe drives.  * The fix addresses an scheduling issue in the original    implementation of wbt/writeback throttling  * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock    contention and thundering herd issue in wbt_wait"),    plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve    waking of tasks").  * There are additional commits to help with a cleaner    backport and future maintenance:    - Cosmic: total of 8 clean cherry picks.    - Bionic: total of 13 commits, 9 clean cherry-picks      and 4 backports, which are just changes to context      lines (i.e. refresh) without any functional changes      in the backport itself. [Test Case]  * This command has been reported to reproduce the problem:    $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128  * It generates stack traces as below in the original kernel,    and does not generate them in the modified/patched kernel.  * The user/reporter verified the test kernel with these patches    resolved the problem.  * The developer verified in 2 systems (4-core and 24-core but    no NVMe) for regressions, and no error messages were logged    to dmesg. [Regression Potential]  * The commits have been verified for fixes in later commits in    linux-next as of 2019-01-08 and all known fix commits are in.  * The regression potential is mostly contained in the writeback    throttling (block/blk-wbt.*), as almost all of the 13 patches    change exclusively that, except for 4 of them (2 of which are    sysfs):    - blk-rq-qos: refactor out common elements of blk-wbt (block/)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-sysfs.c)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-{mq-}sysfs.c)    - block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*) [Other Info]  * Alternatively, it is probably possible to introduce just the    two commits that fix this with some changes to their code in    the backport, but since the 'blk-rq-qos: refactor ..' commit    may become a dependency for many additional/future fixes, it    seemed interesting to pull it in earlier in the 18.04 cycle.  * The problem has been introduced with the blk-wbt mechanism,    in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,    so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ...
2019-01-10 15:20:03 Stefan Bader nominated for series Ubuntu Cosmic
2019-01-10 15:20:03 Stefan Bader bug task added linux (Ubuntu Cosmic)
2019-01-10 15:20:03 Stefan Bader nominated for series Ubuntu Bionic
2019-01-10 15:20:03 Stefan Bader bug task added linux (Ubuntu Bionic)
2019-01-10 15:20:14 Stefan Bader linux (Ubuntu Cosmic): importance Undecided Medium
2019-01-10 15:20:18 Stefan Bader linux (Ubuntu Bionic): importance Undecided Medium
2019-01-10 15:20:21 Stefan Bader linux (Ubuntu): importance Undecided Medium
2019-01-10 19:43:11 Mauricio Faria de Oliveira description [Impact]  * Users may experience cpu hard lockups when performing    rigorous writes to NVMe drives.  * The fix addresses an scheduling issue in the original    implementation of wbt/writeback throttling  * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock    contention and thundering herd issue in wbt_wait"),    plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve    waking of tasks").  * There are additional commits to help with a cleaner    backport and future maintenance:    - Cosmic: total of 8 clean cherry picks.    - Bionic: total of 13 commits, 9 clean cherry-picks      and 4 backports, which are just changes to context      lines (i.e. refresh) without any functional changes      in the backport itself. [Test Case]  * This command has been reported to reproduce the problem:    $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128  * It generates stack traces as below in the original kernel,    and does not generate them in the modified/patched kernel.  * The user/reporter verified the test kernel with these patches    resolved the problem.  * The developer verified in 2 systems (4-core and 24-core but    no NVMe) for regressions, and no error messages were logged    to dmesg. [Regression Potential]  * The commits have been verified for fixes in later commits in    linux-next as of 2019-01-08 and all known fix commits are in.  * The regression potential is mostly contained in the writeback    throttling (block/blk-wbt.*), as almost all of the 13 patches    change exclusively that, except for 4 of them (2 of which are    sysfs):    - blk-rq-qos: refactor out common elements of blk-wbt (block/)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-sysfs.c)    - block: Protect less code with sysfs_lock in blk_{un,}register_queue() (blk-{mq-}sysfs.c)    - block: pass struct request instead of struct blk_issue_stat to wbt (block/, still mostly blk-wbt.*) [Other Info]  * Alternatively, it is probably possible to introduce just the    two commits that fix this with some changes to their code in    the backport, but since the 'blk-rq-qos: refactor ..' commit    may become a dependency for many additional/future fixes, it    seemed interesting to pull it in earlier in the 18.04 cycle.  * The problem has been introduced with the blk-wbt mechanism,    in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,    so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ... [Impact] * Users may experience cpu hard lockups when performing rigorous writes to NVMe drives. * The fix addresses an scheduling issue in the original implementation of wbt/writeback throttling * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait"), plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve waking of tasks"). * Plus a few dependency commits for the second fix. * Backports are trivial: mainly replace rq_wait_inc_below() with the equivalent atomic_inc_below(), and maintain the __wbt_done() signature, both due to the lack of commit a79050434b45 ("blk-rq-qos: refactor out common elements of blk-wbt"), that changes lots of other/unrelated code. [Test Case] * This command has been reported to reproduce the problem: $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128 * It generates stack traces as below in the original kernel, and does not generate them in the modified/patched kernel. * The user/reporter verified the test kernel with these patches resolved the problem. * The developer verified in 2 systems (4-core and 24-core but no NVMe) for regressions, and no error messages were logged to dmesg. [Regression Potential] * The regression potential is contained within writeback throttling mechanism (block/blk-wbt.*). * The commits have been verified for fixes in later commits in linux-next as of 2019-01-08 and all known fix commits are in. [Other Info] * The problem has been introduced with the blk-wbt mechanism, in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2, so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ...
2019-01-10 19:44:58 Mauricio Faria de Oliveira description [Impact] * Users may experience cpu hard lockups when performing rigorous writes to NVMe drives. * The fix addresses an scheduling issue in the original implementation of wbt/writeback throttling * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait"), plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve waking of tasks"). * Plus a few dependency commits for the second fix. * Backports are trivial: mainly replace rq_wait_inc_below() with the equivalent atomic_inc_below(), and maintain the __wbt_done() signature, both due to the lack of commit a79050434b45 ("blk-rq-qos: refactor out common elements of blk-wbt"), that changes lots of other/unrelated code. [Test Case] * This command has been reported to reproduce the problem: $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128 * It generates stack traces as below in the original kernel, and does not generate them in the modified/patched kernel. * The user/reporter verified the test kernel with these patches resolved the problem. * The developer verified in 2 systems (4-core and 24-core but no NVMe) for regressions, and no error messages were logged to dmesg. [Regression Potential] * The regression potential is contained within writeback throttling mechanism (block/blk-wbt.*). * The commits have been verified for fixes in later commits in linux-next as of 2019-01-08 and all known fix commits are in. [Other Info] * The problem has been introduced with the blk-wbt mechanism, in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2, so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ... [Impact]  * Users may experience cpu hard lockups when performing    rigorous writes to NVMe drives.  * The fix addresses an scheduling issue in the original    implementation of wbt/writeback throttling  * The fix is commit 2887e41b910b ("blk-wbt: Avoid lock    contention and thundering herd issue in wbt_wait"),    plus its fix commit 38cfb5a45ee0 ("blk-wbt: improve    waking of tasks").  * Plus a few dependency commits for each fix.  * Backports are trivial: mainly replace rq_wait_inc_below()    with the equivalent atomic_inc_below(), and maintain the    __wbt_done() signature, both due to the lack of commit    a79050434b45 ("blk-rq-qos: refactor out common elements    of blk-wbt"), that changes lots of other/unrelated code. [Test Case]  * This command has been reported to reproduce the problem:    $ sudo iozone -R -s 5G -r 1m -S 2048 -i 0 -G -c -o -l 128 -u 128 -t 128  * It generates stack traces as below in the original kernel,    and does not generate them in the modified/patched kernel.  * The user/reporter verified the test kernel with these patches    resolved the problem.  * The developer verified in 2 systems (4-core and 24-core but    no NVMe) for regressions, and no error messages were logged    to dmesg. [Regression Potential]  * The regression potential is contained within writeback    throttling mechanism (block/blk-wbt.*).  * The commits have been verified for fixes in later commits in    linux-next as of 2019-01-08 and all known fix commits are in. [Other Info]  * The problem has been introduced with the blk-wbt mechanism,    in v4.10-rc1, and the fix commits in v4.19-rc1 and -rc2,    so only Bionic and Cosmic needs this. [Stack Traces] [ 393.628647] NMI watchdog: Watchdog detected hard LOCKUP on cpu 30 ... [ 393.628704] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 393.628720] Call Trace: [ 393.628721] <IRQ> [ 393.628724] enqueue_task_fair+0x6c/0x7f0 [ 393.628726] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628728] ? __update_load_avg_blocked_se.isra.37+0xd1/0x150 [ 393.628731] activate_task+0x57/0xc0 [ 393.628735] ? sched_clock+0x9/0x10 [ 393.628736] ? sched_clock+0x9/0x10 [ 393.628738] ttwu_do_activate+0x49/0x90 [ 393.628739] try_to_wake_up+0x1df/0x490 [ 393.628741] default_wake_function+0x12/0x20 [ 393.628743] autoremove_wake_function+0x12/0x40 [ 393.628744] __wake_up_common+0x73/0x130 [ 393.628745] __wake_up_common_lock+0x80/0xc0 [ 393.628746] __wake_up+0x13/0x20 [ 393.628749] __wbt_done.part.21+0xa4/0xb0 [ 393.628749] wbt_done+0x72/0xa0 [ 393.628753] blk_mq_free_request+0xca/0x1a0 [ 393.628755] blk_mq_end_request+0x48/0x90 [ 393.628760] nvme_complete_rq+0x23/0x120 [nvme_core] [ 393.628763] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 393.628764] __blk_mq_complete_request+0xd2/0x140 [ 393.628766] blk_mq_complete_request+0x18/0x20 [ 393.628767] nvme_process_cq+0xe1/0x1b0 [nvme] [ 393.628768] nvme_irq+0x23/0x50 [nvme] [ 393.628772] __handle_irq_event_percpu+0x44/0x1a0 [ 393.628773] handle_irq_event_percpu+0x32/0x80 [ 393.628774] handle_irq_event+0x3b/0x60 [ 393.628778] handle_edge_irq+0x7c/0x190 [ 393.628779] handle_irq+0x20/0x30 [ 393.628783] do_IRQ+0x46/0xd0 [ 393.628784] common_interrupt+0x84/0x84 [ 393.628785] </IRQ> ... [ 393.628794] ? cpuidle_enter_state+0x97/0x2f0 [ 393.628796] cpuidle_enter+0x17/0x20 [ 393.628797] call_cpuidle+0x23/0x40 [ 393.628798] do_idle+0x18c/0x1f0 [ 393.628799] cpu_startup_entry+0x73/0x80 [ 393.628802] start_secondary+0x1a6/0x200 [ 393.628804] secondary_startup_64+0xa5/0xb0 [ 393.628805] Code: ... [ 405.981597] nvme nvme1: I/O 393 QID 6 timeout, completion polled [ 435.597209] INFO: rcu_sched detected stalls on CPUs/tasks: [ 435.602858] 30-...0: (1 GPs behind) idle=e26/1/0 softirq=6834/6834 fqs=4485 [ 435.610203] (detected by 8, t=15005 jiffies, g=6396, c=6395, q=146818) [ 435.617025] Sending NMI from CPU 8 to CPUs 30: [ 435.617029] NMI backtrace for cpu 30 [ 435.617031] CPU: 30 PID: 0 Comm: swapper/30 Tainted: P OE 4.15.0-20-generic #21-Ubuntu ... [ 435.617047] Call Trace: [ 435.617048] <IRQ> [ 435.617051] enqueue_entity+0x9f/0x6b0 [ 435.617053] enqueue_task_fair+0x6c/0x7f0 [ 435.617056] activate_task+0x57/0xc0 [ 435.617059] ? sched_clock+0x9/0x10 [ 435.617060] ? sched_clock+0x9/0x10 [ 435.617061] ttwu_do_activate+0x49/0x90 [ 435.617063] try_to_wake_up+0x1df/0x490 [ 435.617065] default_wake_function+0x12/0x20 [ 435.617067] autoremove_wake_function+0x12/0x40 [ 435.617068] __wake_up_common+0x73/0x130 [ 435.617069] __wake_up_common_lock+0x80/0xc0 [ 435.617070] __wake_up+0x13/0x20 [ 435.617073] __wbt_done.part.21+0xa4/0xb0 [ 435.617074] wbt_done+0x72/0xa0 [ 435.617077] blk_mq_free_request+0xca/0x1a0 [ 435.617079] blk_mq_end_request+0x48/0x90 [ 435.617084] nvme_complete_rq+0x23/0x120 [nvme_core] [ 435.617087] nvme_pci_complete_rq+0x7a/0x130 [nvme] [ 435.617088] __blk_mq_complete_request+0xd2/0x140 [ 435.617090] blk_mq_complete_request+0x18/0x20 [ 435.617091] nvme_process_cq+0xe1/0x1b0 [nvme] [ 435.617093] nvme_irq+0x23/0x50 [nvme] [ 435.617096] __handle_irq_event_percpu+0x44/0x1a0 [ 435.617097] handle_irq_event_percpu+0x32/0x80 [ 435.617098] handle_irq_event+0x3b/0x60 [ 435.617101] handle_edge_irq+0x7c/0x190 [ 435.617102] handle_irq+0x20/0x30 [ 435.617106] do_IRQ+0x46/0xd0 [ 435.617107] common_interrupt+0x84/0x84 [ 435.617108] </IRQ> ... [ 435.617117] ? cpuidle_enter_state+0x97/0x2f0 [ 435.617118] cpuidle_enter+0x17/0x20 [ 435.617119] call_cpuidle+0x23/0x40 [ 435.617121] do_idle+0x18c/0x1f0 [ 435.617122] cpu_startup_entry+0x73/0x80 [ 435.617125] start_secondary+0x1a6/0x200 [ 435.617127] secondary_startup_64+0xa5/0xb0 [ 435.617128] Code: ...
2019-01-11 14:26:15 Kleber Sacilotto de Souza linux (Ubuntu Bionic): status New Fix Committed
2019-01-11 14:29:17 Kleber Sacilotto de Souza linux (Ubuntu Cosmic): status New Fix Committed
2019-01-15 10:32:49 Brad Figg tags bionic bionic verification-needed-cosmic
2019-01-15 10:37:01 Brad Figg tags bionic verification-needed-cosmic bionic verification-needed-bionic verification-needed-cosmic
2019-01-17 14:39:34 Mauricio Faria de Oliveira tags bionic verification-needed-bionic verification-needed-cosmic bionic verification-done-cosmic verification-needed-bionic
2019-01-21 16:25:36 Mauricio Faria de Oliveira tags bionic verification-done-cosmic verification-needed-bionic bionic verification-done-bionic verification-done-cosmic
2019-01-28 17:12:01 Launchpad Janitor linux (Ubuntu Bionic): status Fix Committed Fix Released
2019-01-28 17:12:01 Launchpad Janitor cve linked 2018-14625
2019-01-28 17:12:01 Launchpad Janitor cve linked 2018-16882
2019-01-28 17:12:01 Launchpad Janitor cve linked 2018-17972
2019-01-28 17:12:01 Launchpad Janitor cve linked 2018-18281
2019-01-28 17:12:01 Launchpad Janitor cve linked 2018-19407
2019-02-04 08:48:45 Launchpad Janitor linux (Ubuntu Cosmic): status Fix Committed Fix Released