Recently we discovered this bug occurs just alone with stop4 which results in soft lockups/rcu stalls. ``` root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode. [15523.619508] systemd[1]: systemd-journald.service: Failed with result 'timeout'. [15523.619769] systemd[1]: Failed to start Journal Service. [15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off time, scheduling restart. [15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 21. [15523.621462] systemd[1]: Stopped Journal Service. [15523.621635] systemd[1]: systemd-journald.service: Found left-over process 1561 (systemd-journal) in control group while starting unit. Ignoring. [15523.621756] systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. [15523.621888] systemd[1]: systemd-journald.service: Found left-over process 69060 (systemd-journal) in control group while starting unit. Ignoring. [15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched self-detected stall on CPU [15541.629958] 60-....: (2 GPs behind) idle=146/140000000000002/0 softirq=300022/300022 fqs=999069 [15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111) [15541.630101] NMI backtrace for cpu 60 [15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L 4.15.0-15-generic #16-Ubuntu [15541.630207] Call Trace: [15541.630232] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 (unreliable) [15541.630298] [c000201a1da96b40] [c000000000cf4d48] nmi_cpu_backtrace+0x1f8/0x200 [15541.630363] [c000201a1da96bd0] [c000000000cf4ee8] nmi_trigger_cpumask_backtrace+0x198/0x1f0 [15541.630429] [c000201a1da96c60] [c00000000002f2d8] arch_trigger_cpumask_backtrace+0x28/0x40 [15541.630495] [c000201a1da96c80] [c0000000001a913c] rcu_dump_cpu_stacks+0xf4/0x158 [15541.630560] [c000201a1da96cd0] [c0000000001a81e8] rcu_check_callbacks+0x8e8/0xb40 [15541.630625] [c000201a1da96e00] [c0000000001b64a8] update_process_times+0x48/0x90 [15541.630689] [c000201a1da96e30] [c0000000001ce1f4] tick_sched_handle.isra.5+0x34/0xd0 [15541.630753] [c000201a1da96e60] [c0000000001ce2f0] tick_sched_timer+0x60/0xe0 [15541.630818] [c000201a1da96ea0] [c0000000001b7054] __hrtimer_run_queues+0x144/0x370 [15541.630883] [c000201a1da96f20] [c0000000001b7fac] hrtimer_interrupt+0xfc/0x350 [15541.630948] [c000201a1da96ff0] [c0000000000248f0] __timer_interrupt+0x90/0x260 [15541.631013] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0 [15541.631069] [c000201a1da97070] [c000000000009014] decrementer_common+0x114/0x120 [15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180 [15541.631135] LR = smp_call_function_single+0x110/0x180 [15541.631230] [c000201a1da973d0] [c0000000001d55e0] smp_call_function_any+0x180/0x250 [15541.631294] [c000201a1da97430] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580 [15541.631359] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0 [15541.631433] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0 [15541.631488] [c000201a1da975d0] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270 [15541.631553] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4 [15541.631608] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120 [15541.631663] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0 [15541.631718] [c000201a1da977a0] [c000000000009014] decrementer_common+0x114/0x120 [15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450 [15541.631784] LR = smp_call_function_many+0x324/0x450 [15541.631879] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0 [15541.631935] [c000201a1da97b30] [c0000000003a1120] change_huge_pmd+0xe0/0x270 [15541.632000] [c000201a1da97ba0] [c000000000349278] change_protection_range+0xb88/0xe40 [15541.632065] [c000201a1da97cf0] [c0000000003496c0] mprotect_fixup+0x140/0x340 [15541.632129] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350 [15541.632185] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c [15579.001651] watchdog: BUG: soft lockup - CPU#52 stuck for 23s! [grep:69263] [15579.001738] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink input_leds joydev mac_hid idt_89hpesx ofpart cmdlinepart powernv_flash ipmi_powernv ipmi_devintf opal_prd mtd ipmi_msghandler ibmpowernv at24 uio_pdrv_genirq uio vmx_crypto kvm_hv kvm sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast [15579.002363] i2c_algo_bit hid_generic ttm drm_kms_helper mpt3sas syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid raid_class crct10dif_vpmsum crc32c_vpmsum drm i40e aacraid scsi_transport_sas [15579.002524] CPU: 52 PID: 69263 Comm: grep Tainted: G L 4.15.0-15-generic #16-Ubuntu [15579.002598] NIP: c0000000001d5368 LR: c0000000001d5340 CTR: c000000000acc7f0 [15579.002664] REGS: c000003e84eff7e0 TRAP: 0901 Tainted: G L (4.15.0-15-generic) [15579.002735] MSR: 9000000000009033 CR: 48044222 XER: 00000000 [15579.002810] CFAR: c01721ed8 [15579.002810] GPR08: c000000001721ed8 0000000000000001 c009e006592e0960 0000000000000000 [15579.002810] GPR12: c000000000acc7f0 c00000000faa3c00 [15579.003084] NIP [c0000000001d5368] smp_call_function_single+0x138/0x180 [15579.003139] LR [c0000000001d5340] smp_call_function_single+0x110/0x180 [15579.003191] Call Trace: [15579.003217] [c000003e84effa60] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable) [15579.003298] [c000003e84effad0] [c0000000001d55e0] smp_call_function_any+0x180/0x250 [15579.003381] [c000003e84effb30] [c000000000acc840] powernv_cpufreq_get+0x50/0x70 [15579.003447] [c000003e84effb60] [c000000000ac2b8c] __cpufreq_get+0x5c/0x140 [15579.003503] [c000003e84effba0] [c000000000ac2d18] cpufreq_get+0xa8/0xb0 [15579.003560] [c000003e84effbe0] [c00000000009da50] pnv_get_proc_freq+0x20/0x50 [15579.003625] [c000003e84effc00] [c0000000000283bc] show_cpuinfo+0x11c/0x400 [15579.003680] [c000003e84effca0] [c00000000040c738] seq_read+0x138/0x610 [15579.003737] [c000003e84effd40] [c00000000047fa38] proc_reg_read+0x88/0xd0 [15579.003794] [c000003e84effd70] [c0000000003d293c] __vfs_read+0x3c/0x70 [15579.003849] [c000003e84effd90] [c0000000003d2a2c] vfs_read+0xbc/0x1b0 [15579.003905] [c000003e84effde0] [c0000000003d3028] SyS_read+0x68/0x110 [15579.003962] [c000003e84effe30] [c00000000000b184] system_call+0x58/0x6c [15579.004016] Instruction dump: [15579.004051] 7fe4fb78 4bfffd4d 813f0018 71290001 4182002c 48000014 60000000 60000000 [15579.004121] 60000000 60420000 7c210b78 7c421378 <813f0018> 71290001 4082fff0 7c2004ac [15604.648202] INFO: rcu_sched self-detected stall on CPU [15604.648260] 60-....: (2 GPs behind) idle=146/140000000000002/0 softirq=300022/300022 fqs=1005652 [15604.648332] (t=2431300 jiffies g=184827 c=184826 q=57308) [15604.648385] NMI backtrace for cpu 60 [15604.648419] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L 4.15.0-15-generic #16-Ubuntu [15604.648491] Call Trace: [15604.648515] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 (unreliable) [15604.648581] [c000201a1da96b40] [c000000000cf4d48] nmi_cpu_backtrace+0x1f8/0x200 [15604.648647] [c000201a1da96bd0] [c000000000cf4ee8] nmi_trigger_cpumask_backtrace+0x198/0x1f0 [15604.648728] [c000201a1da96c60] [c00000000002f2d8] arch_trigger_cpumask_backtrace+0x28/0x40 [15604.648793] [c000201a1da96c80] [c0000000001a913c] rcu_dump_cpu_stacks+0xf4/0x158 [15604.648858] [c000201a1da96cd0] [c0000000001a81e8] rcu_check_callbacks+0x8e8/0xb40 [15604.648924] [c000201a1da96e00] [c0000000001b64a8] update_process_times+0x48/0x90 [15604.648988] [c000201a1da96e30] [c0000000001ce1f4] tick_sched_handle.isra.5+0x34/0xd0 [15604.649052] [c000201a1da96e60] [c0000000001ce2f0] tick_sched_timer+0x60/0xe0 [15604.649118] [c000201a1da96ea0] [c0000000001b7054] __hrtimer_run_queues+0x144/0x370 [15604.649183] [c000201a1da96f20] [c0000000001b7fac] hrtimer_interrupt+0xfc/0x350 [15604.649248] [c000201a1da96ff0] [c0000000000248f0] __timer_interrupt+0x90/0x260 [15604.649313] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0 [15604.649369] [c000201a1da97070] [c000000000009014] decrementer_common+0x114/0x120 [15604.649435] --- interrupt: 901 at smp_call_function_single+0x138/0x180 [15604.649435] LR = smp_call_function_single+0x110/0x180 [15604.649530] [c000201a1da973d0] [c0000000001d55e0] smp_call_function_any+0x180/0x250 [15604.649595] [c000201a1da97430] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580 [15604.649660] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0 [15604.649715] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0 [15604.649770] [c000201a1da975d0] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270 [15604.649835] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4 [15604.649891] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120 [15604.649946] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0 [15604.650002] [c000201a1da977a0] [c000000000009014] decrementer_common+0x114/0x120 [15604.650084] --- interrupt: 901 at smp_call_function_many+0x330/0x450 [15604.650084] LR = smp_call_function_many+0x324/0x450 [15604.650179] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0 [15604.650235] [c000201a1da97b30] [c0000000003a1120] change_huge_pmd+0xe0/0x270 [15604.650301] [c000201a1da97ba0] [c000000000349278] change_protection_range+0xb88/0xe40 [15604.650366] [c000201a1da97cf0] [c0000000003496c0] mprotect_fixup+0x140/0x340 [15604.650430] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350 [15604.650486] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c [15667.666494] INFO: rcu_sched self-detected stall on CPU [15667.666550] 60-....: (2 GPs behind) idle=146/140000000000002/0 softirq=300022/300022 fqs=1012258 [15667.666622] (t=2447054 jiffies g=184827 c=184826 q=57457) [15667.666675] NMI backtrace for cpu 60 [15667.666709] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L 4.15.0-15-generic #16-Ubuntu [15667.666781] Call Trace: [15667.666805] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 (unreliable) [15667.666871] [c000201a1da96b40] [c000000000cf4d48] nmi_cpu_backtrace+0x1f8/0x200 [15667.666937] [c000201a1da96bd0] [c000000000cf4ee8] nmi_trigger_cpumask_backtrace+0x198/0x1f0 [15667.667002] [c000201a1da96c60] [c00000000002f2d8] arch_trigger_cpumask_backtrace+0x28/0x40 [15667.667086] [c000201a1da96c80] [c0000000001a913c] rcu_dump_cpu_stacks+0xf4/0x158 [15667.667151] [c000201a1da96cd0] [c0000000001a81e8] rcu_check_callbacks+0x8e8/0xb40 [15667.667216] [c000201a1da96e00] [c0000000001b64a8] update_process_times+0x48/0x90 [15667.667280] [c000201a1da96e30] [c0000000001ce1f4] tick_sched_handle.isra.5+0x34/0xd0 [15667.667344] [c000201a1da96e60] [c0000000001ce2f0] tick_sched_timer+0x60/0xe0 [15667.667409] [c000201a1da96ea0] [c0000000001b7054] __hrtimer_run_queues+0x144/0x370 [15667.667474] [c000201a1da96f20] [c0000000001b7fac] hrtimer_interrupt+0xfc/0x350 [15667.667539] [c000201a1da96ff0] [c0000000000248f0] __timer_interrupt+0x90/0x260 [15667.667604] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0 [15667.667660] [c000201a1da97070] [c000000000009014] decrementer_common+0x114/0x120 [15667.667727] --- interrupt: 901 at smp_call_function_single+0x130/0x180 [15667.667727] LR = smp_call_function_single+0x110/0x180 [15667.667821] [c000201a1da973d0] [c0000000001d55e0] smp_call_function_any+0x180/0x250 [15667.667886] [c000201a1da97430] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580 [15667.667951] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0 [15667.668006] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0 [15667.668061] [c000201a1da975d0] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270 [15667.668126] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4 [15667.668181] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120 [15667.668236] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0 [15667.668292] [c000201a1da977a0] [c000000000009014] decrementer_common+0x114/0x120 [15667.668358] --- interrupt: 901 at smp_call_function_many+0x330/0x450 [15667.668358] LR = smp_call_function_many+0x324/0x450 [15667.668469] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0 [15667.668524] [c000201a1da97b30] [c0000000003a1120] change_huge_pmd+0xe0/0x270 [15667.668589] [c000201a1da97ba0] [c000000000349278] change_protection_range+0xb88/0xe40 [15667.668654] [c000201a1da97cf0] [c0000000003496c0] mprotect_fixup+0x140/0x340 [15667.668719] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350 [15667.668775] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c ``` Per feedback from Vaidy, this currently appears to NOT be a firmware problem. This seems to be a kernel synchronization issue leading to a dead lock. ------- Fix identified by Shilpa as per Nick Piggin's recommendation. Kernel fix is currently being tested. ------- Fix upstream in 4.17-rc3 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.17-rc3&id=c0f7f5b6c69107ca92909512533e70258ee19188 cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer interrupt Posted to stable as well. Mirroring to Launchpad for Canonical to pull in commit.