ISST-LTE: Ubuntu16.04.03: PowerNV: 'ppc64_cpu' commands hangs while changing SMT value with Leaf IO and BASE tests

Bug #1708130 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Won't Fix
High
Canonical Kernel Team
linux (Ubuntu)
Won't Fix
High
Canonical Kernel Team

Bug Description

== Comment: #0 - INDIRA P. JOGA <email address hidden> - 2017-07-07 03:56:54 ==
Description:
--------------
Started Leaf IO and BASE(without smt tests)and then tried to change the SMT value manually where it hangs

  UBUNTU BUILD: 4.10.0-26-generic

Steps to re-create:
------------------
> Installed latest Ubuntu160403 kernel on system lotkvm
4.10.0-26-generic

> Leaf microcode: KMIPP113

> Started Leaf IO and BASE tests(without smt tests).

root@lotkvm:/home# show.report.py
HOSTNAME KERNEL VERSION DISTRO INFO
-------- ----------------- -----------
lotkvm 4.10.0-26-generic Ubuntu 16.04.2 LTS \n \l

######## Current Time: Tue Jul 4 00:55:37 2017 ########
Job-ID FOCUS Start-Time Duration Function
------ ----- ---------- -------- --------
1 IO 20170704-00:44:45 0.0 hr(s) 10.0 min(s) IO_Focus
2 BASE 20170704-00:44:52 0.0 hr(s) 10.0 min(s) Test

FOCUS IO BASE SUM
TOTAL 76 25 101
FAIL 0 4 4
PASS 76 21 97
(%) (100%) (84%) (96%)

>Now manually changed the smt value

root@lotkvm:/home# ppc64_cpu --smt
SMT=8
root@lotkvm:/home# date
Tue Jul 4 00:46:01 CDT 2017
root@lotkvm:/home# ppc64_cpu --smt=2
root@lotkvm:/home# ppc64_cpu --smt
SMT=2
root@lotkvm:/home# date
Tue Jul 4 00:50:01 CDT 2017
root@lotkvm:/home# ppc64_cpu --smt=4
root@lotkvm:/home# ppc64_cpu --smt
SMT=4
root@lotkvm:/home# date
Tue Jul 4 00:54:38 CDT 2017
root@lotkvm:/home# ppc64_cpu --smt=8

[ 2055.142781] INFO: task jbd2/nvme0n1p6-:22052 blocked for more than 120 seconds.
[ 2055.142915] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.142978] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.143150] INFO: task kworker/48:0H:21755 blocked for more than 120 seconds.
[ 2055.143226] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.143289] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.143570] INFO: task kworker/u259:3:22436 blocked for more than 120 seconds.
[ 2055.143647] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.143709] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.143953] INFO: task kworker/8:188:118516 blocked for more than 120 seconds.
[ 2055.144029] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.144091] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.144289] INFO: task mkfs.ntfs:95505 blocked for more than 120 seconds.
[ 2055.144353] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.144416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.144600] INFO: task ppc64_cpu:80305 blocked for more than 120 seconds.
[ 2055.144665] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.144727] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.144872] INFO: task rm:80950 blocked for more than 120 seconds.
[ 2055.144936] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.144998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.145133] INFO: task rm:80951 blocked for more than 120 seconds.
[ 2055.145195] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.145257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2175.974718] INFO: task jbd2/nvme0n1p6-:22052 blocked for more than 120 seconds.
[ 2175.974848] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2175.974912] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2175.975068] INFO: task kworker/48:0H:21755 blocked for more than 120 seconds.
[ 2175.975144] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2175.975206] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

> root@lotkvm:# ps -eaf | grep ppc*
root 48054 12068 0 01:24 pts/0 00:00:00 grep --color=auto ppc*
root 80305 5719 0 00:54 hvc0 00:00:00 ppc64_cpu --smt 8

> ppc64_cpu --smt command hangs here . Not able to change the SMT value from 4 to 8.

> Attached dmesg logs

== Comment: #2 - VIPIN K. PARASHAR <email address hidden> - 2017-07-07 13:29:41 ==
From kernel logs
=============

[ 2055.142781] INFO: task jbd2/nvme0n1p6-:22052 blocked for more than 120 seconds.
[ 2055.142915] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.142978] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.143055] jbd2/nvme0n1p6- D 0 22052 2 0x00000800
[ 2055.143059] Call Trace:
[ 2055.143063] [c000001e1c6537f0] [c0000017dfd2206c] 0xc0000017dfd2206c (unreliable)
[ 2055.143070] [c000001e1c6539c0] [c00000000001be70] __switch_to+0x2c0/0x450
[ 2055.143075] [c000001e1c653a20] [c000000000b9ee48] __schedule+0x2f8/0x970
[ 2055.143079] [c000001e1c653b00] [c000000000b9f50c] schedule+0x4c/0xc0
[ 2055.143083] [c000001e1c653b30] [c000000000495d88] jbd2_journal_commit_transaction+0x248/0x1e60
[ 2055.143087] [c000001e1c653d30] [c00000000049eb50] kjournald2+0xf0/0x300
[ 2055.143091] [c000001e1c653dc0] [c00000000011768c] kthread+0x16c/0x1b0
[ 2055.143095] [c000001e1c653e30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[ 2055.143150] INFO: task kworker/48:0H:21755 blocked for more than 120 seconds.
[ 2055.143226] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.143289] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.143364] kworker/48:0H D 0 21755 2 0x00000800
[ 2055.143414] Workqueue: xfs-log/nvme0n1p3 xfs_log_worker [xfs]
[ 2055.143416] Call Trace:
[ 2055.143418] [c000001e1338b5b0] [c000001e1338b6b0] 0xc000001e1338b6b0 (unreliable)
[ 2055.143421] [c000001e1338b780] [c00000000001be70] __switch_to+0x2c0/0x450
[ 2055.143425] [c000001e1338b7e0] [c000000000b9ee48] __schedule+0x2f8/0x970
[ 2055.143428] [c000001e1338b8c0] [c000000000b9f50c] schedule+0x4c/0xc0
[ 2055.143431] [c000001e1338b8f0] [c000000000ba45c4] schedule_timeout+0x284/0x480
[ 2055.143434] [c000001e1338b9e0] [c000000000ba054c] wait_for_common+0xec/0x250
[ 2055.143438] [c000001e1338ba60] [c00000000010cc1c] flush_work+0x14c/0x2a0
[ 2055.143477] [c000001e1338baf0] [d000000017bd2378] xlog_cil_force_lsn+0x98/0x300 [xfs]
[ 2055.143516] [c000001e1338bbb0] [d000000017bcf41c] _xfs_log_force+0xbc/0x3f0 [xfs]
[ 2055.143555] [c000001e1338bc50] [d000000017bcf8c8] xfs_log_worker+0x58/0x1a0 [xfs]
[ 2055.143558] [c000001e1338bc90] [c00000000010e1f8] process_one_work+0x1e8/0x5b0
[ 2055.143562] [c000001e1338bd20] [c00000000010e668] worker_thread+0xa8/0x660
[ 2055.143565] [c000001e1338bdc0] [c00000000011768c] kthread+0x16c/0x1b0
[ 2055.143568] [c000001e1338be30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[ 2055.143570] INFO: task kworker/u259:3:22436 blocked for more than 120 seconds.
[ 2055.143647] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.143709] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.143784] kworker/u259:3 D 0 22436 2 0x00000800
[ 2055.143790] Workqueue: writeback wb_workfn (flush-7:0)
[ 2055.143792] Call Trace:
[ 2055.143796] [c000001e133eeee0] [c0000000014e2090] sysctl_sched_migration_cost+0x0/0x4 (unreliable)
[ 2055.143800] [c000001e133ef0b0] [c00000000001be70] __switch_to+0x2c0/0x450
[ 2055.143803] [c000001e133ef110] [c000000000b9ee48] __schedule+0x2f8/0x970
[ 2055.143806] [c000001e133ef1f0] [c000000000b9f50c] schedule+0x4c/0xc0
[ 2055.143809] [c000001e133ef220] [c000000000ba45c4] schedule_timeout+0x284/0x480
[ 2055.143813] [c000001e133ef310] [c000000000b9eab4] io_schedule_timeout+0xd4/0x170
[ 2055.143815] [c000001e133ef360] [c000000000620afc] wbt_wait+0x3ac/0x4d0
[ 2055.143820] [c000001e133ef400] [c0000000005ecf00] blk_sq_make_request+0x110/0x540
[ 2055.143823] [c000001e133ef4b0] [c0000000005da094] generic_make_request+0x154/0x310
[ 2055.143826] [c000001e133ef510] [c0000000005da320] submit_bio+0xd0/0x1f0
[ 2055.143829] [c000001e133ef5c0] [c0000000003b1460] submit_bh_wbc+0x1c0/0x220
[ 2055.143832] [c000001e133ef610] [c0000000003b1740] __block_write_full_page+0x280/0x570
[ 2055.143835] [c000001e133ef6b0] [c0000000003b6960] blkdev_writepage+0x30/0x50
[ 2055.143839] [c000001e133ef6d0] [c000000000297af0] __writepage+0x40/0xb0
[ 2055.143843] [c000001e133ef700] [c00000000029702c] write_cache_pages+0x25c/0x5a0
[ 2055.143846] [c000001e133ef840] [c0000000002973d4] generic_writepages+0x64/0xa0
[ 2055.143849] [c000001e133ef8a0] [c0000000003b68a4] blkdev_writepages+0x44/0x80
[ 2055.143851] [c000001e133ef8c0] [c00000000029a33c] do_writepages+0x4c/0x80
[ 2055.143855] [c000001e133ef8e0] [c0000000003a2bc0] __writeback_single_inode+0x70/0x510
[ 2055.143858] [c000001e133ef940] [c0000000003a372c] writeback_sb_inodes+0x2cc/0x590
[ 2055.143861] [c000001e133efa50] [c0000000003a3ad4] __writeback_inodes_wb+0xe4/0x150
[ 2055.143863] [c000001e133efab0] [c0000000003a3f3c] wb_writeback+0x2fc/0x440
[ 2055.143866] [c000001e133efb80] [c0000000003a4d78] wb_workfn+0x268/0x580
[ 2055.143870] [c000001e133efc90] [c00000000010e1f8] process_one_work+0x1e8/0x5b0
[ 2055.143873] [c000001e133efd20] [c00000000010e668] worker_thread+0xa8/0x660
[ 2055.143876] [c000001e133efdc0] [c00000000011768c] kthread+0x16c/0x1b0
[ 2055.143879] [c000001e133efe30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[ 2055.143953] INFO: task kworker/8:188:118516 blocked for more than 120 seconds.
[ 2055.144029] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.144091] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.144165] kworker/8:188 D 0 118516 2 0x00000800
[ 2055.144171] Workqueue: events vmstat_shepherd
[ 2055.144172] Call Trace:
[ 2055.144174] [c000001da4be3830] [c000001da4be3860] 0xc000001da4be3860 (unreliable)
[ 2055.144178] [c000001da4be3a00] [c00000000001be70] __switch_to+0x2c0/0x450
[ 2055.144181] [c000001da4be3a60] [c000000000b9ee48] __schedule+0x2f8/0x970
[ 2055.144184] [c000001da4be3b40] [c000000000b9f50c] schedule+0x4c/0xc0
[ 2055.144187] [c000001da4be3b70] [c000000000b9fa34] schedule_preempt_disabled+0x24/0x40
[ 2055.144191] [c000001da4be3b90] [c000000000ba2338] __mutex_lock_slowpath+0x208/0x380
[ 2055.144195] [c000001da4be3c10] [c0000000000e9cbc] get_online_cpus+0x5c/0xa0
[ 2055.144197] [c000001da4be3c40] [c0000000002b867c] vmstat_shepherd+0x3c/0x160
[ 2055.144201] [c000001da4be3c90] [c00000000010e1f8] process_one_work+0x1e8/0x5b0
[ 2055.144204] [c000001da4be3d20] [c00000000010e668] worker_thread+0xa8/0x660
[ 2055.144206] [c000001da4be3dc0] [c00000000011768c] kthread+0x16c/0x1b0
[ 2055.144210] [c000001da4be3e30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[ 2055.144289] INFO: task mkfs.ntfs:95505 blocked for more than 120 seconds.
[ 2055.144353] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.144416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.144490] mkfs.ntfs D 0 95505 92952 0x00040000
[ 2055.144492] Call Trace:
[ 2055.144494] [c000000f52ec3270] [c000000f52ec32b0] 0xc000000f52ec32b0 (unreliable)
[ 2055.144498] [c000000f52ec3440] [c00000000001be70] __switch_to+0x2c0/0x450
[ 2055.144501] [c000000f52ec34a0] [c000000000b9ee48] __schedule+0x2f8/0x970
[ 2055.144504] [c000000f52ec3580] [c000000000b9f50c] schedule+0x4c/0xc0
[ 2055.144507] [c000000f52ec35b0] [c000000000ba45c4] schedule_timeout+0x284/0x480
[ 2055.144510] [c000000f52ec36a0] [c000000000b9eab4] io_schedule_timeout+0xd4/0x170
[ 2055.144513] [c000000f52ec36f0] [c000000000620afc] wbt_wait+0x3ac/0x4d0
[ 2055.144516] [c000000f52ec3790] [c0000000005ecf00] blk_sq_make_request+0x110/0x540
[ 2055.144520] [c000000f52ec3840] [c0000000005da094] generic_make_request+0x154/0x310
[ 2055.144523] [c000000f52ec38a0] [c0000000005da320] submit_bio+0xd0/0x1f0
[ 2055.144525] [c000000f52ec3950] [c0000000003b1460] submit_bh_wbc+0x1c0/0x220
[ 2055.144528] [c000000f52ec39a0] [c0000000003b1740] __block_write_full_page+0x280/0x570
[ 2055.144531] [c000000f52ec3a40] [c0000000003b6960] blkdev_writepage+0x30/0x50
[ 2055.144534] [c000000f52ec3a60] [c000000000297af0] __writepage+0x40/0xb0
[ 2055.144537] [c000000f52ec3a90] [c00000000029702c] write_cache_pages+0x25c/0x5a0
[ 2055.144541] [c000000f52ec3bd0] [c0000000002973d4] generic_writepages+0x64/0xa0
[ 2055.144544] [c000000f52ec3c30] [c0000000003b68a4] blkdev_writepages+0x44/0x80
[ 2055.144546] [c000000f52ec3c50] [c00000000029a33c] do_writepages+0x4c/0x80
[ 2055.144549] [c000000f52ec3c70] [c000000000286478] __filemap_fdatawrite_range+0x108/0x190
[ 2055.144552] [c000000f52ec3d10] [c000000000286748] filemap_write_and_wait_range+0x68/0xf0
[ 2055.144555] [c000000f52ec3d50] [c0000000003b5534] blkdev_fsync+0x34/0xa0
[ 2055.144558] [c000000f52ec3d80] [c0000000003a9ed8] vfs_fsync_range+0x78/0x170
[ 2055.144561] [c000000f52ec3dd0] [c0000000003aa06c] do_fsync+0x5c/0xb0
[ 2055.144564] [c000000f52ec3e10] [c0000000003aa46c] SyS_fsync+0x2c/0x40
[ 2055.144567] [c000000f52ec3e30] [c00000000000b184] system_call+0x38/0xe0
[ 2055.144600] INFO: task ppc64_cpu:80305 blocked for more than 120 seconds.
[ 2055.144665] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.144727] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.144801] ppc64_cpu D 0 80305 5719 0x00040000
[ 2055.144804] Call Trace:
[ 2055.144805] [c000001d963d7650] [c000001d963d76a0] 0xc000001d963d76a0 (unreliable)
[ 2055.144809] [c000001d963d7820] [c00000000001be70] __switch_to+0x2c0/0x450
[ 2055.144812] [c000001d963d7880] [c000000000b9ee48] __schedule+0x2f8/0x970
[ 2055.144815] [c000001d963d7960] [c000000000b9f50c] schedule+0x4c/0xc0
[ 2055.144818] [c000001d963d7990] [c0000000005e9f24] blk_mq_freeze_queue_wait+0x94/0x120
[ 2055.144822] [c000001d963d7a00] [c0000000005ec848] blk_mq_queue_reinit_work+0xb8/0x180
[ 2055.144825] [c000001d963d7a40] [c0000000005ec9f8] blk_mq_queue_reinit_prepare+0x88/0xa0
[ 2055.144828] [c000001d963d7a70] [c0000000000e851c] cpuhp_invoke_callback+0x17c/0x600
[ 2055.144831] [c000001d963d7ae0] [c0000000000e8bf8] cpuhp_up_callbacks+0x58/0x150
[ 2055.144835] [c000001d963d7b30] [c0000000000eb444] _cpu_up+0xf4/0x1d0
[ 2055.144838] [c000001d963d7b90] [c0000000000eb650] do_cpu_up+0x130/0x160
[ 2055.144841] [c000001d963d7c10] [c00000000078681c] cpu_subsys_online+0x6c/0xf0
[ 2055.144845] [c000001d963d7c60] [c00000000077dd64] device_online+0xb4/0x120
[ 2055.144848] [c000001d963d7ca0] [c00000000077de84] online_store+0xb4/0xc0
[ 2055.144850] [c000001d963d7ce0] [c000000000778b40] dev_attr_store+0x40/0x70
[ 2055.144853] [c000001d963d7d00] [c00000000041de9c] sysfs_kf_write+0x6c/0xa0
[ 2055.144856] [c000001d963d7d20] [c00000000041cd4c] kernfs_fop_write+0x17c/0x250
[ 2055.144860] [c000001d963d7d70] [c00000000035a6e0] __vfs_write+0x40/0x80
[ 2055.144863] [c000001d963d7d90] [c00000000035c2b4] vfs_write+0xd4/0x270
[ 2055.144866] [c000001d963d7de0] [c00000000035df1c] SyS_write+0x6c/0x110
[ 2055.144869] [c000001d963d7e30] [c00000000000b184] system_call+0x38/0xe0
[ 2055.144872] INFO: task rm:80950 blocked for more than 120 seconds.
[ 2055.144936] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.144998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.145072] rm D 0 80950 78982 0x00040000
[ 2055.145075] Call Trace:
[ 2055.145077] [c000000f694234d0] [c000000f694235f0] 0xc000000f694235f0 (unreliable)
[ 2055.145080] [c000000f694236a0] [c00000000001be70] __switch_to+0x2c0/0x450
[ 2055.145083] [c000000f69423700] [c000000000b9ee48] __schedule+0x2f8/0x970
[ 2055.145086] [c000000f694237e0] [c000000000b9f50c] schedule+0x4c/0xc0
[ 2055.145090] [c000000f69423810] [c000000000ba45c4] schedule_timeout+0x284/0x480
[ 2055.145093] [c000000f69423900] [c000000000b9eab4] io_schedule_timeout+0xd4/0x170
[ 2055.145096] [c000000f69423950] [c000000000ba0224] bit_wait_io+0x34/0x90
[ 2055.145099] [c000000f69423980] [c000000000b9fc18] __wait_on_bit+0xf8/0x170
[ 2055.145102] [c000000f694239d0] [c000000000b9fff8] out_of_line_wait_on_bit+0x88/0xa0
[ 2055.145105] [c000000f69423a50] [c000000000493710] do_get_write_access+0x370/0x630
[ 2055.145108] [c000000f69423b30] [c000000000493a5c] jbd2_journal_get_write_access+0x8c/0xf0
[ 2055.145112] [c000000f69423b60] [c0000000004707bc] __ext4_journal_get_write_access+0x8c/0xe0
[ 2055.145115] [c000000f69423ba0] [c0000000004333ac] ext4_reserve_inode_write+0xcc/0x100
[ 2055.145119] [c000000f69423bf0] [c000000000433444] ext4_mark_inode_dirty+0x64/0x270
[ 2055.145121] [c000000f69423ca0] [c0000000004445d0] ext4_unlink+0x370/0x400
[ 2055.145124] [c000000f69423d40] [c00000000036d984] vfs_unlink+0x104/0x2a0
[ 2055.145128] [c000000f69423d90] [c000000000374ef8] do_unlinkat+0x368/0x3a0
[ 2055.145131] [c000000f69423e30] [c00000000000b184] system_call+0x38/0xe0
[ 2055.145133] INFO: task rm:80951 blocked for more than 120 seconds.
[ 2055.145195] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2055.145257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2055.145331] rm D 0 80951 110079 0x00040000
[ 2055.145334] Call Trace:
[ 2055.145337] [c0000007716374d0] [c0000000014e2090] sysctl_sched_migration_cost+0x0/0x4 (unreliable)
[ 2055.145340] [c0000007716376a0] [c00000000001be70] __switch_to+0x2c0/0x450
[ 2055.145343] [c000000771637700] [c000000000b9ee48] __schedule+0x2f8/0x970
[ 2055.145346] [c0000007716377e0] [c000000000b9f50c] schedule+0x4c/0xc0
[ 2055.145350] [c000000771637810] [c000000000ba45c4] schedule_timeout+0x284/0x480
[ 2055.145353] [c000000771637900] [c000000000b9eab4] io_schedule_timeout+0xd4/0x170
[ 2055.145356] [c000000771637950] [c000000000ba0224] bit_wait_io+0x34/0x90
[ 2055.145359] [c000000771637980] [c000000000b9fc18] __wait_on_bit+0xf8/0x170
[ 2055.145362] [c0000007716379d0] [c000000000b9fff8] out_of_line_wait_on_bit+0x88/0xa0
[ 2055.145365] [c000000771637a50] [c000000000493710] do_get_write_access+0x370/0x630
[ 2055.145368] [c000000771637b30] [c000000000493a5c] jbd2_journal_get_write_access+0x8c/0xf0
[ 2055.145371] [c000000771637b60] [c0000000004707bc] __ext4_journal_get_write_access+0x8c/0xe0
[ 2055.145374] [c000000771637ba0] [c0000000004333ac] ext4_reserve_inode_write+0xcc/0x100
[ 2055.145377] [c000000771637bf0] [c000000000433444] ext4_mark_inode_dirty+0x64/0x270
[ 2055.145380] [c000000771637ca0] [c0000000004445d0] ext4_unlink+0x370/0x400
[ 2055.145383] [c000000771637d40] [c00000000036d984] vfs_unlink+0x104/0x2a0
[ 2055.145386] [c000000771637d90] [c000000000374ef8] do_unlinkat+0x368/0x3a0
[ 2055.145389] [c000000771637e30] [c00000000000b184] system_call+0x38/0xe0
[ 2175.974718] INFO: task jbd2/nvme0n1p6-:22052 blocked for more than 120 seconds.
[ 2175.974848] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2175.974912] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2175.974988] jbd2/nvme0n1p6- D 0 22052 2 0x00000800
[ 2175.974991] Call Trace:
[ 2175.974994] [c000001e1c6537f0] [c0000017dfd2206c] 0xc0000017dfd2206c (unreliable)
[ 2175.975001] [c000001e1c6539c0] [c00000000001be70] __switch_to+0x2c0/0x450
[ 2175.975006] [c000001e1c653a20] [c000000000b9ee48] __schedule+0x2f8/0x970
[ 2175.975009] [c000001e1c653b00] [c000000000b9f50c] schedule+0x4c/0xc0
[ 2175.975014] [c000001e1c653b30] [c000000000495d88] jbd2_journal_commit_transaction+0x248/0x1e60
[ 2175.975017] [c000001e1c653d30] [c00000000049eb50] kjournald2+0xf0/0x300
[ 2175.975022] [c000001e1c653dc0] [c00000000011768c] kthread+0x16c/0x1b0
[ 2175.975025] [c000001e1c653e30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[ 2175.975068] INFO: task kworker/48:0H:21755 blocked for more than 120 seconds.
[ 2175.975144] Not tainted 4.10.0-26-generic #30~16.04.1-Ubuntu
[ 2175.975206] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2175.975281] kworker/48:0H D 0 21755 2 0x00000800
[ 2175.975329] Workqueue: xfs-log/nvme0n1p3 xfs_log_worker [xfs]
[ 2175.975331] Call Trace:
[ 2175.975332] [c000001e1338b5b0] [c000001e1338b6b0] 0xc000001e1338b6b0 (unreliable)
[ 2175.975336] [c000001e1338b780] [c00000000001be70] __switch_to+0x2c0/0x450
[ 2175.975339] [c000001e1338b7e0] [c000000000b9ee48] __schedule+0x2f8/0x970
[ 2175.975342] [c000001e1338b8c0] [c000000000b9f50c] schedule+0x4c/0xc0
[ 2175.975346] [c000001e1338b8f0] [c000000000ba45c4] schedule_timeout+0x284/0x480
[ 2175.975349] [c000001e1338b9e0] [c000000000ba054c] wait_for_common+0xec/0x250
[ 2175.975353] [c000001e1338ba60] [c00000000010cc1c] flush_work+0x14c/0x2a0
[ 2175.975392] [c000001e1338baf0] [d000000017bd2378] xlog_cil_force_lsn+0x98/0x300 [xfs]
[ 2175.975431] [c000001e1338bbb0] [d000000017bcf41c] _xfs_log_force+0xbc/0x3f0 [xfs]
[ 2175.975470] [c000001e1338bc50] [d000000017bcf8c8] xfs_log_worker+0x58/0x1a0 [xfs]
[ 2175.975474] [c000001e1338bc90] [c00000000010e1f8] process_one_work+0x1e8/0x5b0
[ 2175.975477] [c000001e1338bd20] [c00000000010e668] worker_thread+0xa8/0x660
[ 2175.975480] [c000001e1338bdc0] [c00000000011768c] kthread+0x16c/0x1b0
[ 2175.975483] [c000001e1338be30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[ 3810.434586] Buffer I/O error on dev loop1, logical block 2399, async page read

== Comment: #7 - VIPIN K. PARASHAR <email address hidden> - 2017-07-12 05:45:57 ==
[ 2055.144801] ppc64_cpu D 0 80305 5719 0x00040000
[ 2055.144804] Call Trace:
[ 2055.144805] [c000001d963d7650] [c000001d963d76a0] 0xc000001d963d76a0 (unreliable)
[ 2055.144809] [c000001d963d7820] [c00000000001be70] __switch_to+0x2c0/0x450
[ 2055.144812] [c000001d963d7880] [c000000000b9ee48] __schedule+0x2f8/0x970
[ 2055.144815] [c000001d963d7960] [c000000000b9f50c] schedule+0x4c/0xc0
[ 2055.144818] [c000001d963d7990] [c0000000005e9f24] blk_mq_freeze_queue_wait+0x94/0x120
[ 2055.144822] [c000001d963d7a00] [c0000000005ec848] blk_mq_queue_reinit_work+0xb8/0x180
[ 2055.144825] [c000001d963d7a40] [c0000000005ec9f8] blk_mq_queue_reinit_prepare+0x88/0xa0
[ 2055.144828] [c000001d963d7a70] [c0000000000e851c] cpuhp_invoke_callback+0x17c/0x600
[ 2055.144831] [c000001d963d7ae0] [c0000000000e8bf8] cpuhp_up_callbacks+0x58/0x150
[ 2055.144835] [c000001d963d7b30] [c0000000000eb444] _cpu_up+0xf4/0x1d0
[ 2055.144838] [c000001d963d7b90] [c0000000000eb650] do_cpu_up+0x130/0x160
[ 2055.144841] [c000001d963d7c10] [c00000000078681c] cpu_subsys_online+0x6c/0xf0
[ 2055.144845] [c000001d963d7c60] [c00000000077dd64] device_online+0xb4/0x120
[ 2055.144848] [c000001d963d7ca0] [c00000000077de84] online_store+0xb4/0xc0
[ 2055.144850] [c000001d963d7ce0] [c000000000778b40] dev_attr_store+0x40/0x70
[ 2055.144853] [c000001d963d7d00] [c00000000041de9c] sysfs_kf_write+0x6c/0xa0
[ 2055.144856] [c000001d963d7d20] [c00000000041cd4c] kernfs_fop_write+0x17c/0x250
[ 2055.144860] [c000001d963d7d70] [c00000000035a6e0] __vfs_write+0x40/0x80
[ 2055.144863] [c000001d963d7d90] [c00000000035c2b4] vfs_write+0xd4/0x270
[ 2055.144866] [c000001d963d7de0] [c00000000035df1c] SyS_write+0x6c/0x110
[ 2055.144869] [c000001d963d7e30] [c00000000000b184] system_call+0x38/0xe0

ppc64_cpu process is stuck in blk_mq routines. Similar issue has also
reported online also as below:

http://marc.info/?l=linux-block&m=149056984921439&w=2

== Comment: #29 - Wen Xiong <email address hidden> - 2017-07-31 16:51:16 ==
Paul from kernel team suggested to drop the following patch into Ubuntu. We have verified the patch fixed the ppc64_cpu hung issue.

http://marc.info/?l=linux-block&m=149860870015032&w=2

Let us know if you have any questions.

Thanks,
Wendy

Revision history for this message
bugproxy (bugproxy) wrote : dmesg_logs

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-156441 severity-high targetmilestone-inin16043
Revision history for this message
bugproxy (bugproxy) wrote : hung-logs

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : dmesg_logs

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote :

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : hung-logs

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : dmesg_logs

Default Comment by Bridge

Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you post a note when the patch from Paul lands in mainline or linux-next?

Changed in linux (Ubuntu):
status: New → Triaged
tags: added: kernel-da-key
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → Triaged
Revision history for this message
bugproxy (bugproxy) wrote : rcu-Migrate-callbacks-earlier-in-the-CPU-offline-timeline.BACKPORT.patch

------- Comment on attachment From <email address hidden> 2017-08-16 18:20 EDT-------

Hi Joseph,

> Can you post a note when the patch from Paul lands in mainline or linux-next?

Ingo pulled Paul's patch into tip.git, and it's now in linux-next.git.

rcu: Migrate callbacks earlier in the CPU-offline timeline
git.kernel.org/tip/tip/c/a58163d8ca2c8d288ee9f95989712f98473a5ac2
git.kernel.org/next/linux-next/c/a58163d8ca2c8d288ee9f95989712f98473a5ac2

Paul wrote a backport to 4.11 (attached), which applies cleanly on linux-hwe[-edge].
I am building it now on top of linux-hwe-edge (v4.11) and linux-hwe (v4.10) for testing.

cheers,
Mauricio

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-08-16 20:09 EDT-------
Hi Indira,

Can you please test whether this test kernel fixes the problem?

http://dorno.rch.stglabs.ibm.com/~mauricfo/kernel/linux-hwe-4.10.0-330_4.10.0-330.37~16.04.1+bz156441/

Thanks,
Mauricio

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Mauricio,

Do you plan on submitting an SRU request to the Ubuntu kernel team mailing list once testing is verified by Indira?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-08-22 11:50 EDT-------
@Joseph,

> @Mauricio,
>
> Do you plan on submitting an SRU request to the Ubuntu kernel team mailing
> list once testing is verified by Indira?

Yes, exactly.

If you'd recommend to submit it earlier, since already accepted upstream, I can do that --

We know this patch fixes problems with similar symptoms on other distros / duplicates, but since this involves a SRU request to LTS, in an area such as RCU, I'm clearly following the SRU requirement of testing first. :- )

cheers,
Mauricio

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the update and being willing to perform the SRU. It's best to wait for the testing results like you say.

Thanks,

Joe

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-08-24 13:16 EDT-------
Hi Indira,

Can you verify the Mauricio's kernel with your test?

Thanks,
Wendy

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-04 13:26 EDT-------
Hi Mauricio,

Started Leaf IO and BASE(without smt tests)and then tried to change the SMT value using test script and ppc64 command hangs after few hours of regression run.

> root@lotkvm:/kte/tools/setup.d# show.report.py
HOSTNAME KERNEL VERSION DISTRO INFO
-------- ------------------ -----------
lotkvm 4.10.0-330-generic Ubuntu 16.04.3 LTS \n \l

######## Current Time: Mon Sep 4 07:53:05 2017 ########
Job-ID FOCUS Start-Time Duration Function
------ ----- ---------- -------- --------
2 IO 20170904-06:36:31 1.0 hr(s) 16.0 min(s) IO_Focus
3 BASE 20170904-07:52:20 0.0 hr(s) 0.0 min(s) Test

FOCUS IO BASE SUM
TOTAL 774 6 780
FAIL 0 2 2
PASS 774 4 778
(%) (100%) (66%) (99%)

> Used below test script from the path lotkvm:/home/smtchange.sh to change SMT value periodically

[ipjoga@kte tmp]$ cat smt*
#!/bin/bash

while true
do ppc64_cpu --smt=2
echo "SMT changed to 2"
date
sleep 240
ppc64_cpu --smt=4
echo "SMT changed to 4"
date
sleep 240
ppc64_cpu --smt=8
echo "SMT changed to 8"
date
sleep 240
ppc64_cpu --smt=off
echo "SMT changed to off"
date
sleep 240
done

root@lotkvm:/home# nohup ./smtchange.sh &
[1] 4604
root@lotkvm:/home# nohup: ignoring input and appending output to 'nohup.out'

> Run went fine for few hours like 3 - 4 hours but after that seen ppc64 command hangs as below from the log

root@lotkvm:/home# cat nohup.out
SMT changed to 2
Mon Sep 4 07:54:02 CDT 2017
SMT changed to 4
Mon Sep 4 07:58:06 CDT 2017
SMT changed to 8
Mon Sep 4 08:02:16 CDT 2017
SMT changed to off
Mon Sep 4 08:06:39 CDT 2017
SMT changed to 2
Mon Sep 4 08:10:40 CDT 2017
SMT changed to 4
Mon Sep 4 08:14:47 CDT 2017
SMT changed to 8
Mon Sep 4 08:18:56 CDT 2017
SMT changed to off
Mon Sep 4 08:23:19 CDT 2017
SMT changed to 2
Mon Sep 4 08:27:22 CDT 2017
SMT changed to 4
Mon Sep 4 08:31:26 CDT 2017
SMT changed to 8
Mon Sep 4 08:35:36 CDT 2017
SMT changed to off
Mon Sep 4 08:39:58 CDT 2017
SMT changed to 2
Mon Sep 4 08:44:02 CDT 2017
SMT changed to 4
Mon Sep 4 08:48:07 CDT 2017
SMT changed to 8
Mon Sep 4 08:52:17 CDT 2017
SMT changed to off
Mon Sep 4 08:56:40 CDT 2017
SMT changed to 2
Mon Sep 4 09:00:42 CDT 2017
SMT changed to 4
Mon Sep 4 09:04:47 CDT 2017
SMT changed to 8
Mon Sep 4 09:09:02 CDT 2017
SMT changed to off
Mon Sep 4 09:13:26 CDT 2017
SMT changed to 2
Mon Sep 4 09:17:29 CDT 2017
SMT changed to 4
Mon Sep 4 09:21:37 CDT 2017
SMT changed to 8
Mon Sep 4 09:25:46 CDT 2017
SMT changed to off
Mon Sep 4 09:30:07 CDT 2017
SMT changed to 2
Mon Sep 4 09:34:09 CDT 2017
SMT changed to 4
Mon Sep 4 09:38:27 CDT 2017
SMT changed to 8
Mon Sep 4 09:42:42 CDT 2017

root@lotkvm:~# date
Mon Sep 4 12:03:28 CDT 2017

root@lotkvm:/home# ppc64_cpu --smt

^Z^X^C

root@lotkvm:/home# ps -ef | grep ppc64*

>root@lotkvm:~# uname -a
Linux lotkvm 4.10.0-330-generic #37~16.04.1+bz156441 SMP Wed Aug 16 17:45:59 CDT 2017 ppc64le ppc64le ppc64le GNU/Linux
root@lotkvm:~# uname -r
4.10.0-330-generic

> System is available for debugging

> Trying to get dmesg logs as console hangs after running ppc64 command

Regards,
Indira

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-09-04 13:34 EDT-------
Indira, thank you.

Wendy,

reassigning back to you for further analysis,
as only Paul's patch was not sufficient to fix this problem on Ubuntu.

(In reply to comment #45)
> Started Leaf IO and BASE(without smt tests)and then tried to change the SMT
> value using test script and ppc64 command hangs after few hours of
> regression run.

> Linux lotkvm 4.10.0-330-generic #37~16.04.1+bz156441 SMP Wed Aug 16 17:45:59 CDT 2017 ppc64le ppc64le ppc64le GNU/Linux

Revision history for this message
bugproxy (bugproxy) wrote : hung-logs

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : dmesg_logs

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : rcu-Migrate-callbacks-earlier-in-the-CPU-offline-timeline.BACKPORT.patch

------- Comment on attachment From <email address hidden> 2017-08-16 18:20 EDT-------

Hi Joseph,

> Can you post a note when the patch from Paul lands in mainline or linux-next?

Ingo pulled Paul's patch into tip.git, and it's now in linux-next.git.

rcu: Migrate callbacks earlier in the CPU-offline timeline
git.kernel.org/tip/tip/c/a58163d8ca2c8d288ee9f95989712f98473a5ac2
git.kernel.org/next/linux-next/c/a58163d8ca2c8d288ee9f95989712f98473a5ac2

Paul wrote a backport to 4.11 (attached), which applies cleanly on linux-hwe[-edge].
I am building it now on top of linux-hwe-edge (v4.11) and linux-hwe (v4.10) for testing.

cheers,
Mauricio

Manoj Iyer (manjo)
tags: added: triage-a
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Mauricio, I was just wondering if there was results from the testing mentioned in comment #18?

Manoj Iyer (manjo)
tags: added: triage-g
removed: triage-a
Changed in ubuntu-power-systems:
status: Triaged → Incomplete
Manoj Iyer (manjo)
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel (canonical-kernel)
assignee: Canonical Kernel (canonical-kernel) → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-12-11 11:08 EDT-------
Hi Joseph,

> @Mauricio, I was just wondering if there was results from the testing
> mentioned in comment #18?

Unfortunately that kernel patch alone did not resolve the problem.
This bug seems a bit stale, waiting for developer/analysis cycles.

cheers,
Mauricio

Revision history for this message
Manoj Iyer (manjo) wrote :

The 4.10 kernel was replaced with 4.13 linux-hwe, please retest with 4.13 (Artful release) and reopen this bug if you are able to reproduce it. Zesty has reached end of life so please retest with Artful/Bionic.

Changed in linux (Ubuntu):
status: Triaged → Won't Fix
Changed in ubuntu-power-systems:
status: Incomplete → Won't Fix
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-04-17 06:30 EDT-------
Hi Lata,

We no longer have this test setup in our environment, pls close this bugzilla as un reproducible .

Thanks.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-29 03:52 EDT-------
(In reply to comment #67)
> Hi Lata,
>
> We no longer have this test setup in our environment, pls close this
> bugzilla as un reproducible .
>
> Thanks.

Closing this Bug in IBM Bugzilla.

tags: added: targetmilestone-inin---
removed: targetmilestone-inin16043
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.