Activity log for bug #1768898

Date Who What changed Old value New value Message
2018-05-03 16:20:08 bugproxy bug added bug
2018-05-03 16:20:11 bugproxy tags architecture-ppc64le bugnameltc-166937 severity-critical targetmilestone-inin1804
2018-05-03 16:20:15 bugproxy ubuntu: assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
2018-05-03 16:20:20 bugproxy affects ubuntu linux (Ubuntu)
2018-05-03 16:23:21 Frank Heimes bug task added ubuntu-power-systems
2018-05-03 16:23:37 Frank Heimes ubuntu-power-systems: status New Triaged
2018-05-03 16:23:51 Frank Heimes ubuntu-power-systems: importance Undecided Critical
2018-05-03 16:24:04 Frank Heimes ubuntu-power-systems: assignee Canonical Kernel Team (canonical-kernel-team)
2018-05-03 16:24:23 Frank Heimes tags architecture-ppc64le bugnameltc-166937 severity-critical targetmilestone-inin1804 architecture-ppc64le bugnameltc-166937 severity-critical targetmilestone-inin1804 triage-g
2018-05-07 15:46:55 Joseph Salisbury linux (Ubuntu): importance Undecided Critical
2018-05-07 15:47:00 Joseph Salisbury linux (Ubuntu): status New Triaged
2018-05-07 15:47:29 Joseph Salisbury nominated for series Ubuntu Bionic
2018-05-07 15:47:29 Joseph Salisbury bug task added linux (Ubuntu Bionic)
2018-05-07 15:47:37 Joseph Salisbury linux (Ubuntu Bionic): importance Undecided Critical
2018-05-07 15:47:40 Joseph Salisbury linux (Ubuntu Bionic): status New Triaged
2018-05-07 15:47:46 Joseph Salisbury linux (Ubuntu Bionic): assignee Joseph Salisbury (jsalisbury)
2018-05-07 15:47:50 Joseph Salisbury linux (Ubuntu): assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) Joseph Salisbury (jsalisbury)
2018-05-14 12:25:53 Joseph Salisbury linux (Ubuntu Bionic): status Triaged In Progress
2018-05-14 12:25:55 Joseph Salisbury linux (Ubuntu): status Triaged In Progress
2018-05-14 14:05:12 Frank Heimes ubuntu-power-systems: status Triaged In Progress
2018-05-18 14:22:47 Joseph Salisbury description Recently we discovered this bug occurs just alone with stop4 which results in soft lockups/rcu stalls. ``` root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode. [15523.619508] systemd[1]: systemd-journald.service: Failed with result 'timeout'. [15523.619769] systemd[1]: Failed to start Journal Service. [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off time, scheduling restart. [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 21. [15523.621462] systemd[1]: Stopped Journal Service. [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 1561 (systemd-journal) in control group while starting unit. Ignoring. [15523.621756] systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 69060 (systemd-journal) in control group while starting unit. Ignoring. [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched self-detected stall on CPU [15541.629958] 60-....: (2 GPs behind) idle=146/140000000000002/0 softirq=300022/300022 fqs=999069 [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111) [15541.630101] NMI backtrace for cpu 60 [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L 4.15.0-15-generic #16-Ubuntu [15541.630207] Call Trace: [15541.630232] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 (unreliable) [15541.630298] [c000201a1da96b40] [c000000000cf4d48] nmi_cpu_backtrace+0x1f8/0x200 [15541.630363] [c000201a1da96bd0] [c000000000cf4ee8] nmi_trigger_cpumask_backtrace+0x198/0x1f0 [15541.630429] [c000201a1da96c60] [c00000000002f2d8] arch_trigger_cpumask_backtrace+0x28/0x40 [15541.630495] [c000201a1da96c80] [c0000000001a913c] rcu_dump_cpu_stacks+0xf4/0x158 [15541.630560] [c000201a1da96cd0] [c0000000001a81e8] rcu_check_callbacks+0x8e8/0xb40 [15541.630625] [c000201a1da96e00] [c0000000001b64a8] update_process_times+0x48/0x90 [15541.630689] [c000201a1da96e30] [c0000000001ce1f4] tick_sched_handle.isra.5+0x34/0xd0 [15541.630753] [c000201a1da96e60] [c0000000001ce2f0] tick_sched_timer+0x60/0xe0 [15541.630818] [c000201a1da96ea0] [c0000000001b7054] __hrtimer_run_queues+0x144/0x370 [15541.630883] [c000201a1da96f20] [c0000000001b7fac] hrtimer_interrupt+0xfc/0x350 [15541.630948] [c000201a1da96ff0] [c0000000000248f0] __timer_interrupt+0x90/0x260 [15541.631013] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0 [15541.631069] [c000201a1da97070] [c000000000009014] decrementer_common+0x114/0x120 [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180 [15541.631135] LR = smp_call_function_single+0x110/0x180 [15541.631230] [c000201a1da973d0] [c0000000001d55e0] smp_call_function_any+0x180/0x250 [15541.631294] [c000201a1da97430] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580 [15541.631359] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0 [15541.631433] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0 [15541.631488] [c000201a1da975d0] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270 [15541.631553] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4 [15541.631608] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120 [15541.631663] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0 [15541.631718] [c000201a1da977a0] [c000000000009014] decrementer_common+0x114/0x120 [15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450 [15541.631784] LR = smp_call_function_many+0x324/0x450 [15541.631879] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0 [15541.631935] [c000201a1da97b30] [c0000000003a1120] change_huge_pmd+0xe0/0x270 [15541.632000] [c000201a1da97ba0] [c000000000349278] change_protection_range+0xb88/0xe40 [15541.632065] [c000201a1da97cf0] [c0000000003496c0] mprotect_fixup+0x140/0x340 [15541.632129] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350 [15541.632185] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c [15579.001651] watchdog: BUG: soft lockup - CPU#52 stuck for 23s! [grep:69263] [15579.001738] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink input_leds joydev mac_hid idt_89hpesx ofpart cmdlinepart powernv_flash ipmi_powernv ipmi_devintf opal_prd mtd ipmi_msghandler ibmpowernv at24 uio_pdrv_genirq uio vmx_crypto kvm_hv kvm sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast [15579.002363] i2c_algo_bit hid_generic ttm drm_kms_helper mpt3sas syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid raid_class crct10dif_vpmsum crc32c_vpmsum drm i40e aacraid scsi_transport_sas [15579.002524] CPU: 52 PID: 69263 Comm: grep Tainted: G L 4.15.0-15-generic #16-Ubuntu [15579.002598] NIP: c0000000001d5368 LR: c0000000001d5340 CTR: c000000000acc7f0 [15579.002664] REGS: c000003e84eff7e0 TRAP: 0901 Tainted: G L (4.15.0-15-generic) [15579.002735] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 48044222 XER: 00000000 [15579.002810] CFAR: c01721ed8 [15579.002810] GPR08: c000000001721ed8 0000000000000001 c009e006592e0960 0000000000000000 [15579.002810] GPR12: c000000000acc7f0 c00000000faa3c00 [15579.003084] NIP [c0000000001d5368] smp_call_function_single+0x138/0x180 [15579.003139] LR [c0000000001d5340] smp_call_function_single+0x110/0x180 [15579.003191] Call Trace: [15579.003217] [c000003e84effa60] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable) [15579.003298] [c000003e84effad0] [c0000000001d55e0] smp_call_function_any+0x180/0x250 [15579.003381] [c000003e84effb30] [c000000000acc840] powernv_cpufreq_get+0x50/0x70 [15579.003447] [c000003e84effb60] [c000000000ac2b8c] __cpufreq_get+0x5c/0x140 [15579.003503] [c000003e84effba0] [c000000000ac2d18] cpufreq_get+0xa8/0xb0 [15579.003560] [c000003e84effbe0] [c00000000009da50] pnv_get_proc_freq+0x20/0x50 [15579.003625] [c000003e84effc00] [c0000000000283bc] show_cpuinfo+0x11c/0x400 [15579.003680] [c000003e84effca0] [c00000000040c738] seq_read+0x138/0x610 [15579.003737] [c000003e84effd40] [c00000000047fa38] proc_reg_read+0x88/0xd0 [15579.003794] [c000003e84effd70] [c0000000003d293c] __vfs_read+0x3c/0x70 [15579.003849] [c000003e84effd90] [c0000000003d2a2c] vfs_read+0xbc/0x1b0 [15579.003905] [c000003e84effde0] [c0000000003d3028] SyS_read+0x68/0x110 [15579.003962] [c000003e84effe30] [c00000000000b184] system_call+0x58/0x6c [15579.004016] Instruction dump: [15579.004051] 7fe4fb78 4bfffd4d 813f0018 71290001 4182002c 48000014 60000000 60000000 [15579.004121] 60000000 60420000 7c210b78 7c421378 <813f0018> 71290001 4082fff0 7c2004ac [15604.648202] INFO: rcu_sched self-detected stall on CPU [15604.648260] 60-....: (2 GPs behind) idle=146/140000000000002/0 softirq=300022/300022 fqs=1005652 [15604.648332] (t=2431300 jiffies g=184827 c=184826 q=57308) [15604.648385] NMI backtrace for cpu 60 [15604.648419] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L 4.15.0-15-generic #16-Ubuntu [15604.648491] Call Trace: [15604.648515] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 (unreliable) [15604.648581] [c000201a1da96b40] [c000000000cf4d48] nmi_cpu_backtrace+0x1f8/0x200 [15604.648647] [c000201a1da96bd0] [c000000000cf4ee8] nmi_trigger_cpumask_backtrace+0x198/0x1f0 [15604.648728] [c000201a1da96c60] [c00000000002f2d8] arch_trigger_cpumask_backtrace+0x28/0x40 [15604.648793] [c000201a1da96c80] [c0000000001a913c] rcu_dump_cpu_stacks+0xf4/0x158 [15604.648858] [c000201a1da96cd0] [c0000000001a81e8] rcu_check_callbacks+0x8e8/0xb40 [15604.648924] [c000201a1da96e00] [c0000000001b64a8] update_process_times+0x48/0x90 [15604.648988] [c000201a1da96e30] [c0000000001ce1f4] tick_sched_handle.isra.5+0x34/0xd0 [15604.649052] [c000201a1da96e60] [c0000000001ce2f0] tick_sched_timer+0x60/0xe0 [15604.649118] [c000201a1da96ea0] [c0000000001b7054] __hrtimer_run_queues+0x144/0x370 [15604.649183] [c000201a1da96f20] [c0000000001b7fac] hrtimer_interrupt+0xfc/0x350 [15604.649248] [c000201a1da96ff0] [c0000000000248f0] __timer_interrupt+0x90/0x260 [15604.649313] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0 [15604.649369] [c000201a1da97070] [c000000000009014] decrementer_common+0x114/0x120 [15604.649435] --- interrupt: 901 at smp_call_function_single+0x138/0x180 [15604.649435] LR = smp_call_function_single+0x110/0x180 [15604.649530] [c000201a1da973d0] [c0000000001d55e0] smp_call_function_any+0x180/0x250 [15604.649595] [c000201a1da97430] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580 [15604.649660] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0 [15604.649715] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0 [15604.649770] [c000201a1da975d0] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270 [15604.649835] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4 [15604.649891] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120 [15604.649946] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0 [15604.650002] [c000201a1da977a0] [c000000000009014] decrementer_common+0x114/0x120 [15604.650084] --- interrupt: 901 at smp_call_function_many+0x330/0x450 [15604.650084] LR = smp_call_function_many+0x324/0x450 [15604.650179] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0 [15604.650235] [c000201a1da97b30] [c0000000003a1120] change_huge_pmd+0xe0/0x270 [15604.650301] [c000201a1da97ba0] [c000000000349278] change_protection_range+0xb88/0xe40 [15604.650366] [c000201a1da97cf0] [c0000000003496c0] mprotect_fixup+0x140/0x340 [15604.650430] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350 [15604.650486] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c [15667.666494] INFO: rcu_sched self-detected stall on CPU [15667.666550] 60-....: (2 GPs behind) idle=146/140000000000002/0 softirq=300022/300022 fqs=1012258 [15667.666622] (t=2447054 jiffies g=184827 c=184826 q=57457) [15667.666675] NMI backtrace for cpu 60 [15667.666709] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L 4.15.0-15-generic #16-Ubuntu [15667.666781] Call Trace: [15667.666805] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 (unreliable) [15667.666871] [c000201a1da96b40] [c000000000cf4d48] nmi_cpu_backtrace+0x1f8/0x200 [15667.666937] [c000201a1da96bd0] [c000000000cf4ee8] nmi_trigger_cpumask_backtrace+0x198/0x1f0 [15667.667002] [c000201a1da96c60] [c00000000002f2d8] arch_trigger_cpumask_backtrace+0x28/0x40 [15667.667086] [c000201a1da96c80] [c0000000001a913c] rcu_dump_cpu_stacks+0xf4/0x158 [15667.667151] [c000201a1da96cd0] [c0000000001a81e8] rcu_check_callbacks+0x8e8/0xb40 [15667.667216] [c000201a1da96e00] [c0000000001b64a8] update_process_times+0x48/0x90 [15667.667280] [c000201a1da96e30] [c0000000001ce1f4] tick_sched_handle.isra.5+0x34/0xd0 [15667.667344] [c000201a1da96e60] [c0000000001ce2f0] tick_sched_timer+0x60/0xe0 [15667.667409] [c000201a1da96ea0] [c0000000001b7054] __hrtimer_run_queues+0x144/0x370 [15667.667474] [c000201a1da96f20] [c0000000001b7fac] hrtimer_interrupt+0xfc/0x350 [15667.667539] [c000201a1da96ff0] [c0000000000248f0] __timer_interrupt+0x90/0x260 [15667.667604] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0 [15667.667660] [c000201a1da97070] [c000000000009014] decrementer_common+0x114/0x120 [15667.667727] --- interrupt: 901 at smp_call_function_single+0x130/0x180 [15667.667727] LR = smp_call_function_single+0x110/0x180 [15667.667821] [c000201a1da973d0] [c0000000001d55e0] smp_call_function_any+0x180/0x250 [15667.667886] [c000201a1da97430] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580 [15667.667951] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0 [15667.668006] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0 [15667.668061] [c000201a1da975d0] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270 [15667.668126] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4 [15667.668181] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120 [15667.668236] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0 [15667.668292] [c000201a1da977a0] [c000000000009014] decrementer_common+0x114/0x120 [15667.668358] --- interrupt: 901 at smp_call_function_many+0x330/0x450 [15667.668358] LR = smp_call_function_many+0x324/0x450 [15667.668469] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0 [15667.668524] [c000201a1da97b30] [c0000000003a1120] change_huge_pmd+0xe0/0x270 [15667.668589] [c000201a1da97ba0] [c000000000349278] change_protection_range+0xb88/0xe40 [15667.668654] [c000201a1da97cf0] [c0000000003496c0] mprotect_fixup+0x140/0x340 [15667.668719] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350 [15667.668775] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c ``` Per feedback from Vaidy, this currently appears to NOT be a firmware problem. This seems to be a kernel synchronization issue leading to a dead lock. ------- Fix identified by Shilpa as per Nick Piggin's recommendation. Kernel fix is currently being tested. ------- Fix upstream in 4.17-rc3 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.17-rc3&id=c0f7f5b6c69107ca92909512533e70258ee19188 cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer interrupt Posted to stable as well. Mirroring to Launchpad for Canonical to pull in commit. == SRU Justification == IBM reports that this bug occurs with stop4 which results in soft lockups/rcu stalls. This is a kernel synchronization issue leading to a dead lock. This bug was introduced by commit 7bc54b652f13 in v4.8-rc1. This regression is fixed by mainline commit c0f7f5b6c6910. == Fix == c0f7f5b6c6910 ("cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer interrupt") == Regression Potential == Low. Fixes current regression. Cc'd to upstream stable, so it has had additon upstream review. == Test Case == A test kernel was built with this patch and tested by the original bug reporter. The bug reporter states the test kernel resolved the bug. Recently we discovered this bug occurs just alone with stop4 which results in soft lockups/rcu stalls. ``` root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode. [15523.619508] systemd[1]: systemd-journald.service: Failed with result 'timeout'. [15523.619769] systemd[1]: Failed to start Journal Service. [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off time, scheduling restart. [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 21. [15523.621462] systemd[1]: Stopped Journal Service. [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 1561 (systemd-journal) in control group while starting unit. Ignoring. [15523.621756] systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 69060 (systemd-journal) in control group while starting unit. Ignoring. [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched self-detected stall on CPU [15541.629958] 60-....: (2 GPs behind) idle=146/140000000000002/0 softirq=300022/300022 fqs=999069 [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111) [15541.630101] NMI backtrace for cpu 60 [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L 4.15.0-15-generic #16-Ubuntu [15541.630207] Call Trace: [15541.630232] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 (unreliable) [15541.630298] [c000201a1da96b40] [c000000000cf4d48] nmi_cpu_backtrace+0x1f8/0x200 [15541.630363] [c000201a1da96bd0] [c000000000cf4ee8] nmi_trigger_cpumask_backtrace+0x198/0x1f0 [15541.630429] [c000201a1da96c60] [c00000000002f2d8] arch_trigger_cpumask_backtrace+0x28/0x40 [15541.630495] [c000201a1da96c80] [c0000000001a913c] rcu_dump_cpu_stacks+0xf4/0x158 [15541.630560] [c000201a1da96cd0] [c0000000001a81e8] rcu_check_callbacks+0x8e8/0xb40 [15541.630625] [c000201a1da96e00] [c0000000001b64a8] update_process_times+0x48/0x90 [15541.630689] [c000201a1da96e30] [c0000000001ce1f4] tick_sched_handle.isra.5+0x34/0xd0 [15541.630753] [c000201a1da96e60] [c0000000001ce2f0] tick_sched_timer+0x60/0xe0 [15541.630818] [c000201a1da96ea0] [c0000000001b7054] __hrtimer_run_queues+0x144/0x370 [15541.630883] [c000201a1da96f20] [c0000000001b7fac] hrtimer_interrupt+0xfc/0x350 [15541.630948] [c000201a1da96ff0] [c0000000000248f0] __timer_interrupt+0x90/0x260 [15541.631013] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0 [15541.631069] [c000201a1da97070] [c000000000009014] decrementer_common+0x114/0x120 [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180 [15541.631135] LR = smp_call_function_single+0x110/0x180 [15541.631230] [c000201a1da973d0] [c0000000001d55e0] smp_call_function_any+0x180/0x250 [15541.631294] [c000201a1da97430] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580 [15541.631359] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0 [15541.631433] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0 [15541.631488] [c000201a1da975d0] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270 [15541.631553] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4 [15541.631608] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120 [15541.631663] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0 [15541.631718] [c000201a1da977a0] [c000000000009014] decrementer_common+0x114/0x120 [15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450 [15541.631784] LR = smp_call_function_many+0x324/0x450 [15541.631879] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0 [15541.631935] [c000201a1da97b30] [c0000000003a1120] change_huge_pmd+0xe0/0x270 [15541.632000] [c000201a1da97ba0] [c000000000349278] change_protection_range+0xb88/0xe40 [15541.632065] [c000201a1da97cf0] [c0000000003496c0] mprotect_fixup+0x140/0x340 [15541.632129] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350 [15541.632185] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c [15579.001651] watchdog: BUG: soft lockup - CPU#52 stuck for 23s! [grep:69263] [15579.001738] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink input_leds joydev mac_hid idt_89hpesx ofpart cmdlinepart powernv_flash ipmi_powernv ipmi_devintf opal_prd mtd ipmi_msghandler ibmpowernv at24 uio_pdrv_genirq uio vmx_crypto kvm_hv kvm sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast [15579.002363] i2c_algo_bit hid_generic ttm drm_kms_helper mpt3sas syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid raid_class crct10dif_vpmsum crc32c_vpmsum drm i40e aacraid scsi_transport_sas [15579.002524] CPU: 52 PID: 69263 Comm: grep Tainted: G L 4.15.0-15-generic #16-Ubuntu [15579.002598] NIP: c0000000001d5368 LR: c0000000001d5340 CTR: c000000000acc7f0 [15579.002664] REGS: c000003e84eff7e0 TRAP: 0901 Tainted: G L (4.15.0-15-generic) [15579.002735] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 48044222 XER: 00000000 [15579.002810] CFAR: c01721ed8 [15579.002810] GPR08: c000000001721ed8 0000000000000001 c009e006592e0960 0000000000000000 [15579.002810] GPR12: c000000000acc7f0 c00000000faa3c00 [15579.003084] NIP [c0000000001d5368] smp_call_function_single+0x138/0x180 [15579.003139] LR [c0000000001d5340] smp_call_function_single+0x110/0x180 [15579.003191] Call Trace: [15579.003217] [c000003e84effa60] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable) [15579.003298] [c000003e84effad0] [c0000000001d55e0] smp_call_function_any+0x180/0x250 [15579.003381] [c000003e84effb30] [c000000000acc840] powernv_cpufreq_get+0x50/0x70 [15579.003447] [c000003e84effb60] [c000000000ac2b8c] __cpufreq_get+0x5c/0x140 [15579.003503] [c000003e84effba0] [c000000000ac2d18] cpufreq_get+0xa8/0xb0 [15579.003560] [c000003e84effbe0] [c00000000009da50] pnv_get_proc_freq+0x20/0x50 [15579.003625] [c000003e84effc00] [c0000000000283bc] show_cpuinfo+0x11c/0x400 [15579.003680] [c000003e84effca0] [c00000000040c738] seq_read+0x138/0x610 [15579.003737] [c000003e84effd40] [c00000000047fa38] proc_reg_read+0x88/0xd0 [15579.003794] [c000003e84effd70] [c0000000003d293c] __vfs_read+0x3c/0x70 [15579.003849] [c000003e84effd90] [c0000000003d2a2c] vfs_read+0xbc/0x1b0 [15579.003905] [c000003e84effde0] [c0000000003d3028] SyS_read+0x68/0x110 [15579.003962] [c000003e84effe30] [c00000000000b184] system_call+0x58/0x6c [15579.004016] Instruction dump: [15579.004051] 7fe4fb78 4bfffd4d 813f0018 71290001 4182002c 48000014 60000000 60000000 [15579.004121] 60000000 60420000 7c210b78 7c421378 <813f0018> 71290001 4082fff0 7c2004ac [15604.648202] INFO: rcu_sched self-detected stall on CPU [15604.648260] 60-....: (2 GPs behind) idle=146/140000000000002/0 softirq=300022/300022 fqs=1005652 [15604.648332] (t=2431300 jiffies g=184827 c=184826 q=57308) [15604.648385] NMI backtrace for cpu 60 [15604.648419] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L 4.15.0-15-generic #16-Ubuntu [15604.648491] Call Trace: [15604.648515] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 (unreliable) [15604.648581] [c000201a1da96b40] [c000000000cf4d48] nmi_cpu_backtrace+0x1f8/0x200 [15604.648647] [c000201a1da96bd0] [c000000000cf4ee8] nmi_trigger_cpumask_backtrace+0x198/0x1f0 [15604.648728] [c000201a1da96c60] [c00000000002f2d8] arch_trigger_cpumask_backtrace+0x28/0x40 [15604.648793] [c000201a1da96c80] [c0000000001a913c] rcu_dump_cpu_stacks+0xf4/0x158 [15604.648858] [c000201a1da96cd0] [c0000000001a81e8] rcu_check_callbacks+0x8e8/0xb40 [15604.648924] [c000201a1da96e00] [c0000000001b64a8] update_process_times+0x48/0x90 [15604.648988] [c000201a1da96e30] [c0000000001ce1f4] tick_sched_handle.isra.5+0x34/0xd0 [15604.649052] [c000201a1da96e60] [c0000000001ce2f0] tick_sched_timer+0x60/0xe0 [15604.649118] [c000201a1da96ea0] [c0000000001b7054] __hrtimer_run_queues+0x144/0x370 [15604.649183] [c000201a1da96f20] [c0000000001b7fac] hrtimer_interrupt+0xfc/0x350 [15604.649248] [c000201a1da96ff0] [c0000000000248f0] __timer_interrupt+0x90/0x260 [15604.649313] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0 [15604.649369] [c000201a1da97070] [c000000000009014] decrementer_common+0x114/0x120 [15604.649435] --- interrupt: 901 at smp_call_function_single+0x138/0x180 [15604.649435] LR = smp_call_function_single+0x110/0x180 [15604.649530] [c000201a1da973d0] [c0000000001d55e0] smp_call_function_any+0x180/0x250 [15604.649595] [c000201a1da97430] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580 [15604.649660] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0 [15604.649715] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0 [15604.649770] [c000201a1da975d0] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270 [15604.649835] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4 [15604.649891] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120 [15604.649946] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0 [15604.650002] [c000201a1da977a0] [c000000000009014] decrementer_common+0x114/0x120 [15604.650084] --- interrupt: 901 at smp_call_function_many+0x330/0x450 [15604.650084] LR = smp_call_function_many+0x324/0x450 [15604.650179] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0 [15604.650235] [c000201a1da97b30] [c0000000003a1120] change_huge_pmd+0xe0/0x270 [15604.650301] [c000201a1da97ba0] [c000000000349278] change_protection_range+0xb88/0xe40 [15604.650366] [c000201a1da97cf0] [c0000000003496c0] mprotect_fixup+0x140/0x340 [15604.650430] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350 [15604.650486] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c [15667.666494] INFO: rcu_sched self-detected stall on CPU [15667.666550] 60-....: (2 GPs behind) idle=146/140000000000002/0 softirq=300022/300022 fqs=1012258 [15667.666622] (t=2447054 jiffies g=184827 c=184826 q=57457) [15667.666675] NMI backtrace for cpu 60 [15667.666709] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L 4.15.0-15-generic #16-Ubuntu [15667.666781] Call Trace: [15667.666805] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 (unreliable) [15667.666871] [c000201a1da96b40] [c000000000cf4d48] nmi_cpu_backtrace+0x1f8/0x200 [15667.666937] [c000201a1da96bd0] [c000000000cf4ee8] nmi_trigger_cpumask_backtrace+0x198/0x1f0 [15667.667002] [c000201a1da96c60] [c00000000002f2d8] arch_trigger_cpumask_backtrace+0x28/0x40 [15667.667086] [c000201a1da96c80] [c0000000001a913c] rcu_dump_cpu_stacks+0xf4/0x158 [15667.667151] [c000201a1da96cd0] [c0000000001a81e8] rcu_check_callbacks+0x8e8/0xb40 [15667.667216] [c000201a1da96e00] [c0000000001b64a8] update_process_times+0x48/0x90 [15667.667280] [c000201a1da96e30] [c0000000001ce1f4] tick_sched_handle.isra.5+0x34/0xd0 [15667.667344] [c000201a1da96e60] [c0000000001ce2f0] tick_sched_timer+0x60/0xe0 [15667.667409] [c000201a1da96ea0] [c0000000001b7054] __hrtimer_run_queues+0x144/0x370 [15667.667474] [c000201a1da96f20] [c0000000001b7fac] hrtimer_interrupt+0xfc/0x350 [15667.667539] [c000201a1da96ff0] [c0000000000248f0] __timer_interrupt+0x90/0x260 [15667.667604] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0 [15667.667660] [c000201a1da97070] [c000000000009014] decrementer_common+0x114/0x120 [15667.667727] --- interrupt: 901 at smp_call_function_single+0x130/0x180 [15667.667727] LR = smp_call_function_single+0x110/0x180 [15667.667821] [c000201a1da973d0] [c0000000001d55e0] smp_call_function_any+0x180/0x250 [15667.667886] [c000201a1da97430] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580 [15667.667951] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0 [15667.668006] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0 [15667.668061] [c000201a1da975d0] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270 [15667.668126] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4 [15667.668181] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120 [15667.668236] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0 [15667.668292] [c000201a1da977a0] [c000000000009014] decrementer_common+0x114/0x120 [15667.668358] --- interrupt: 901 at smp_call_function_many+0x330/0x450 [15667.668358] LR = smp_call_function_many+0x324/0x450 [15667.668469] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0 [15667.668524] [c000201a1da97b30] [c0000000003a1120] change_huge_pmd+0xe0/0x270 [15667.668589] [c000201a1da97ba0] [c000000000349278] change_protection_range+0xb88/0xe40 [15667.668654] [c000201a1da97cf0] [c0000000003496c0] mprotect_fixup+0x140/0x340 [15667.668719] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350 [15667.668775] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c ``` Per feedback from Vaidy, this currently appears to NOT be a firmware problem. This seems to be a kernel synchronization issue leading to a dead lock. ------- Fix identified by Shilpa as per Nick Piggin's recommendation. Kernel fix is currently being tested.  ------- Fix upstream in 4.17-rc3 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.17-rc3&id=c0f7f5b6c69107ca92909512533e70258ee19188 cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer interrupt Posted to stable as well. Mirroring to Launchpad for Canonical to pull in commit.
2018-05-23 08:08:34 Stefan Bader linux (Ubuntu Bionic): status In Progress Fix Committed
2018-05-24 18:05:35 Brad Figg tags architecture-ppc64le bugnameltc-166937 severity-critical targetmilestone-inin1804 triage-g architecture-ppc64le bugnameltc-166937 severity-critical targetmilestone-inin1804 triage-g verification-needed-bionic
2018-05-26 07:39:24 bugproxy tags architecture-ppc64le bugnameltc-166937 severity-critical targetmilestone-inin1804 triage-g verification-needed-bionic architecture-ppc64le bugnameltc-166937 severity-critical targetmilestone-inin1804 triage-g verification-done-bionic
2018-06-11 13:51:38 Manoj Iyer ubuntu-power-systems: status In Progress Fix Committed
2018-06-11 13:51:49 Manoj Iyer linux (Ubuntu): status In Progress Fix Committed
2018-06-11 15:08:06 Launchpad Janitor linux (Ubuntu Bionic): status Fix Committed Fix Released
2018-06-11 15:08:06 Launchpad Janitor cve linked 2018-1092
2018-06-11 15:08:06 Launchpad Janitor cve linked 2018-3639
2018-06-11 15:08:06 Launchpad Janitor cve linked 2018-8087
2018-06-14 12:16:29 Launchpad Janitor linux (Ubuntu): status Fix Committed Fix Released
2018-06-18 13:47:29 Manoj Iyer ubuntu-power-systems: status Fix Committed Fix Released