smp_call_function_single/many core hangs with stop4 alone

Bug #1768898 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Fix Released
Critical
Canonical Kernel Team
linux (Ubuntu)
Fix Released
Critical
Joseph Salisbury
Bionic
Fix Released
Critical
Joseph Salisbury

Bug Description

== SRU Justification ==
IBM reports that this bug occurs with stop4 which results in soft lockups/rcu stalls.
This is a kernel synchronization issue leading to a dead lock.

This bug was introduced by commit 7bc54b652f13 in v4.8-rc1. This
regression is fixed by mainline commit c0f7f5b6c6910.

== Fix ==
c0f7f5b6c6910 ("cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer interrupt")

== Regression Potential ==
Low. Fixes current regression. Cc'd to upstream stable, so it has had
additon upstream review.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

Recently we discovered this bug occurs just alone with stop4 which results in soft lockups/rcu stalls.

```
root@ltc-boston125:~# [15523.619395] systemd[1]: systemd-journald.service: Processes still around after final SIGKILL. Entering failed mode.
[15523.619508] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[15523.619769] systemd[1]: Failed to start Journal Service.
[15523.620618] systemd[1]: systemd-journald.service: Service has no hold-off time, scheduling restart.
[15523.620774] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 21.
[15523.621462] systemd[1]: Stopped Journal Service.
[15523.621635] systemd[1]: systemd-journald.service: Found left-over process 1561 (systemd-journal) in control group while starting unit. Ignoring.
[15523.621756] systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
[15523.621888] systemd[1]: systemd-journald.service: Found left-over process 69060 (systemd-journal) in control group while starting unit. Ignoring.
[15523.622029] systemd[1]: This usually indica[15541.629904] INFO: rcu_sched self-detected stall on CPU
[15541.629958] 60-....: (2 GPs behind) idle=146/140000000000002/0 softirq=300022/300022 fqs=999069
[15541.630046] (t=2415546 jiffies g=184827 c=184826 q=57111)
[15541.630101] NMI backtrace for cpu 60
[15541.630135] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L 4.15.0-15-generic #16-Ubuntu
[15541.630207] Call Trace:
[15541.630232] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 (unreliable)
[15541.630298] [c000201a1da96b40] [c000000000cf4d48] nmi_cpu_backtrace+0x1f8/0x200
[15541.630363] [c000201a1da96bd0] [c000000000cf4ee8] nmi_trigger_cpumask_backtrace+0x198/0x1f0
[15541.630429] [c000201a1da96c60] [c00000000002f2d8] arch_trigger_cpumask_backtrace+0x28/0x40
[15541.630495] [c000201a1da96c80] [c0000000001a913c] rcu_dump_cpu_stacks+0xf4/0x158
[15541.630560] [c000201a1da96cd0] [c0000000001a81e8] rcu_check_callbacks+0x8e8/0xb40
[15541.630625] [c000201a1da96e00] [c0000000001b64a8] update_process_times+0x48/0x90
[15541.630689] [c000201a1da96e30] [c0000000001ce1f4] tick_sched_handle.isra.5+0x34/0xd0
[15541.630753] [c000201a1da96e60] [c0000000001ce2f0] tick_sched_timer+0x60/0xe0
[15541.630818] [c000201a1da96ea0] [c0000000001b7054] __hrtimer_run_queues+0x144/0x370
[15541.630883] [c000201a1da96f20] [c0000000001b7fac] hrtimer_interrupt+0xfc/0x350
[15541.630948] [c000201a1da96ff0] [c0000000000248f0] __timer_interrupt+0x90/0x260
[15541.631013] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
[15541.631069] [c000201a1da97070] [c000000000009014] decrementer_common+0x114/0x120
[15541.631135] --- interrupt: 901 at smp_call_function_single+0x134/0x180
[15541.631135] LR = smp_call_function_single+0x110/0x180
[15541.631230] [c000201a1da973d0] [c0000000001d55e0] smp_call_function_any+0x180/0x250
[15541.631294] [c000201a1da97430] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580
[15541.631359] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
[15541.631433] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
[15541.631488] [c000201a1da975d0] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270
[15541.631553] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
[15541.631608] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
[15541.631663] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
[15541.631718] [c000201a1da977a0] [c000000000009014] decrementer_common+0x114/0x120
[15541.631784] --- interrupt: 901 at smp_call_function_many+0x330/0x450
[15541.631784] LR = smp_call_function_many+0x324/0x450
[15541.631879] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
[15541.631935] [c000201a1da97b30] [c0000000003a1120] change_huge_pmd+0xe0/0x270
[15541.632000] [c000201a1da97ba0] [c000000000349278] change_protection_range+0xb88/0xe40
[15541.632065] [c000201a1da97cf0] [c0000000003496c0] mprotect_fixup+0x140/0x340
[15541.632129] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
[15541.632185] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c
[15579.001651] watchdog: BUG: soft lockup - CPU#52 stuck for 23s! [grep:69263]
[15579.001738] Modules linked in: vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink input_leds joydev mac_hid idt_89hpesx ofpart cmdlinepart powernv_flash ipmi_powernv ipmi_devintf opal_prd mtd ipmi_msghandler ibmpowernv at24 uio_pdrv_genirq uio vmx_crypto kvm_hv kvm sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure ast
[15579.002363] i2c_algo_bit hid_generic ttm drm_kms_helper mpt3sas syscopyarea sysfillrect usbhid sysimgblt fb_sys_fops hid raid_class crct10dif_vpmsum crc32c_vpmsum drm i40e aacraid scsi_transport_sas
[15579.002524] CPU: 52 PID: 69263 Comm: grep Tainted: G L 4.15.0-15-generic #16-Ubuntu
[15579.002598] NIP: c0000000001d5368 LR: c0000000001d5340 CTR: c000000000acc7f0
[15579.002664] REGS: c000003e84eff7e0 TRAP: 0901 Tainted: G L (4.15.0-15-generic)
[15579.002735] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 48044222 XER: 00000000
[15579.002810] CFAR: c01721ed8
[15579.002810] GPR08: c000000001721ed8 0000000000000001 c009e006592e0960 0000000000000000
[15579.002810] GPR12: c000000000acc7f0 c00000000faa3c00
[15579.003084] NIP [c0000000001d5368] smp_call_function_single+0x138/0x180
[15579.003139] LR [c0000000001d5340] smp_call_function_single+0x110/0x180
[15579.003191] Call Trace:
[15579.003217] [c000003e84effa60] [c0000000001d5340] smp_call_function_single+0x110/0x180 (unreliable)
[15579.003298] [c000003e84effad0] [c0000000001d55e0] smp_call_function_any+0x180/0x250
[15579.003381] [c000003e84effb30] [c000000000acc840] powernv_cpufreq_get+0x50/0x70
[15579.003447] [c000003e84effb60] [c000000000ac2b8c] __cpufreq_get+0x5c/0x140
[15579.003503] [c000003e84effba0] [c000000000ac2d18] cpufreq_get+0xa8/0xb0
[15579.003560] [c000003e84effbe0] [c00000000009da50] pnv_get_proc_freq+0x20/0x50
[15579.003625] [c000003e84effc00] [c0000000000283bc] show_cpuinfo+0x11c/0x400
[15579.003680] [c000003e84effca0] [c00000000040c738] seq_read+0x138/0x610
[15579.003737] [c000003e84effd40] [c00000000047fa38] proc_reg_read+0x88/0xd0
[15579.003794] [c000003e84effd70] [c0000000003d293c] __vfs_read+0x3c/0x70
[15579.003849] [c000003e84effd90] [c0000000003d2a2c] vfs_read+0xbc/0x1b0
[15579.003905] [c000003e84effde0] [c0000000003d3028] SyS_read+0x68/0x110
[15579.003962] [c000003e84effe30] [c00000000000b184] system_call+0x58/0x6c
[15579.004016] Instruction dump:
[15579.004051] 7fe4fb78 4bfffd4d 813f0018 71290001 4182002c 48000014 60000000 60000000
[15579.004121] 60000000 60420000 7c210b78 7c421378 <813f0018> 71290001 4082fff0 7c2004ac
[15604.648202] INFO: rcu_sched self-detected stall on CPU
[15604.648260] 60-....: (2 GPs behind) idle=146/140000000000002/0 softirq=300022/300022 fqs=1005652
[15604.648332] (t=2431300 jiffies g=184827 c=184826 q=57308)
[15604.648385] NMI backtrace for cpu 60
[15604.648419] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L 4.15.0-15-generic #16-Ubuntu
[15604.648491] Call Trace:
[15604.648515] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 (unreliable)
[15604.648581] [c000201a1da96b40] [c000000000cf4d48] nmi_cpu_backtrace+0x1f8/0x200
[15604.648647] [c000201a1da96bd0] [c000000000cf4ee8] nmi_trigger_cpumask_backtrace+0x198/0x1f0
[15604.648728] [c000201a1da96c60] [c00000000002f2d8] arch_trigger_cpumask_backtrace+0x28/0x40
[15604.648793] [c000201a1da96c80] [c0000000001a913c] rcu_dump_cpu_stacks+0xf4/0x158
[15604.648858] [c000201a1da96cd0] [c0000000001a81e8] rcu_check_callbacks+0x8e8/0xb40
[15604.648924] [c000201a1da96e00] [c0000000001b64a8] update_process_times+0x48/0x90
[15604.648988] [c000201a1da96e30] [c0000000001ce1f4] tick_sched_handle.isra.5+0x34/0xd0
[15604.649052] [c000201a1da96e60] [c0000000001ce2f0] tick_sched_timer+0x60/0xe0
[15604.649118] [c000201a1da96ea0] [c0000000001b7054] __hrtimer_run_queues+0x144/0x370
[15604.649183] [c000201a1da96f20] [c0000000001b7fac] hrtimer_interrupt+0xfc/0x350
[15604.649248] [c000201a1da96ff0] [c0000000000248f0] __timer_interrupt+0x90/0x260
[15604.649313] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
[15604.649369] [c000201a1da97070] [c000000000009014] decrementer_common+0x114/0x120
[15604.649435] --- interrupt: 901 at smp_call_function_single+0x138/0x180
[15604.649435] LR = smp_call_function_single+0x110/0x180
[15604.649530] [c000201a1da973d0] [c0000000001d55e0] smp_call_function_any+0x180/0x250
[15604.649595] [c000201a1da97430] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580
[15604.649660] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
[15604.649715] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
[15604.649770] [c000201a1da975d0] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270
[15604.649835] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
[15604.649891] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
[15604.649946] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
[15604.650002] [c000201a1da977a0] [c000000000009014] decrementer_common+0x114/0x120
[15604.650084] --- interrupt: 901 at smp_call_function_many+0x330/0x450
[15604.650084] LR = smp_call_function_many+0x324/0x450
[15604.650179] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
[15604.650235] [c000201a1da97b30] [c0000000003a1120] change_huge_pmd+0xe0/0x270
[15604.650301] [c000201a1da97ba0] [c000000000349278] change_protection_range+0xb88/0xe40
[15604.650366] [c000201a1da97cf0] [c0000000003496c0] mprotect_fixup+0x140/0x340
[15604.650430] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
[15604.650486] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c
[15667.666494] INFO: rcu_sched self-detected stall on CPU
[15667.666550] 60-....: (2 GPs behind) idle=146/140000000000002/0 softirq=300022/300022 fqs=1012258
[15667.666622] (t=2447054 jiffies g=184827 c=184826 q=57457)
[15667.666675] NMI backtrace for cpu 60
[15667.666709] CPU: 60 PID: 4810 Comm: tlbie_test Tainted: G L 4.15.0-15-generic #16-Ubuntu
[15667.666781] Call Trace:
[15667.666805] [c000201a1da96b00] [c000000000ceb35c] dump_stack+0xb0/0xf4 (unreliable)
[15667.666871] [c000201a1da96b40] [c000000000cf4d48] nmi_cpu_backtrace+0x1f8/0x200
[15667.666937] [c000201a1da96bd0] [c000000000cf4ee8] nmi_trigger_cpumask_backtrace+0x198/0x1f0
[15667.667002] [c000201a1da96c60] [c00000000002f2d8] arch_trigger_cpumask_backtrace+0x28/0x40
[15667.667086] [c000201a1da96c80] [c0000000001a913c] rcu_dump_cpu_stacks+0xf4/0x158
[15667.667151] [c000201a1da96cd0] [c0000000001a81e8] rcu_check_callbacks+0x8e8/0xb40
[15667.667216] [c000201a1da96e00] [c0000000001b64a8] update_process_times+0x48/0x90
[15667.667280] [c000201a1da96e30] [c0000000001ce1f4] tick_sched_handle.isra.5+0x34/0xd0
[15667.667344] [c000201a1da96e60] [c0000000001ce2f0] tick_sched_timer+0x60/0xe0
[15667.667409] [c000201a1da96ea0] [c0000000001b7054] __hrtimer_run_queues+0x144/0x370
[15667.667474] [c000201a1da96f20] [c0000000001b7fac] hrtimer_interrupt+0xfc/0x350
[15667.667539] [c000201a1da96ff0] [c0000000000248f0] __timer_interrupt+0x90/0x260
[15667.667604] [c000201a1da97040] [c000000000024d08] timer_interrupt+0x98/0xe0
[15667.667660] [c000201a1da97070] [c000000000009014] decrementer_common+0x114/0x120
[15667.667727] --- interrupt: 901 at smp_call_function_single+0x130/0x180
[15667.667727] LR = smp_call_function_single+0x110/0x180
[15667.667821] [c000201a1da973d0] [c0000000001d55e0] smp_call_function_any+0x180/0x250
[15667.667886] [c000201a1da97430] [c000000000acd3e8] gpstate_timer_handler+0x1e8/0x580
[15667.667951] [c000201a1da974e0] [c0000000001b46b0] call_timer_fn+0x50/0x1c0
[15667.668006] [c000201a1da97560] [c0000000001b4958] expire_timers+0x138/0x1f0
[15667.668061] [c000201a1da975d0] [c0000000001b4bf8] run_timer_softirq+0x1e8/0x270
[15667.668126] [c000201a1da97670] [c000000000d0d6c8] __do_softirq+0x158/0x3e4
[15667.668181] [c000201a1da97750] [c000000000114be8] irq_exit+0xe8/0x120
[15667.668236] [c000201a1da97770] [c000000000024d0c] timer_interrupt+0x9c/0xe0
[15667.668292] [c000201a1da977a0] [c000000000009014] decrementer_common+0x114/0x120
[15667.668358] --- interrupt: 901 at smp_call_function_many+0x330/0x450
[15667.668358] LR = smp_call_function_many+0x324/0x450
[15667.668469] [c000201a1da97b00] [c000000000075f18] pmdp_invalidate+0x98/0xe0
[15667.668524] [c000201a1da97b30] [c0000000003a1120] change_huge_pmd+0xe0/0x270
[15667.668589] [c000201a1da97ba0] [c000000000349278] change_protection_range+0xb88/0xe40
[15667.668654] [c000201a1da97cf0] [c0000000003496c0] mprotect_fixup+0x140/0x340
[15667.668719] [c000201a1da97db0] [c000000000349a74] SyS_mprotect+0x1b4/0x350
[15667.668775] [c000201a1da97e30] [c00000000000b184] system_call+0x58/0x6c

```

Per feedback from Vaidy, this currently appears to NOT be a firmware problem. This seems to be a kernel synchronization issue leading to a dead lock.

-------
Fix identified by Shilpa as per Nick Piggin's recommendation. Kernel fix is currently being tested.

 -------
Fix upstream in 4.17-rc3

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.17-rc3&id=c0f7f5b6c69107ca92909512533e70258ee19188
cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer interrupt

Posted to stable as well.

Mirroring to Launchpad for Canonical to pull in commit.

CVE References

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-166937 severity-critical targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: triage-g
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-05-05 10:31 EDT-------
Yesterday, the decision was made at Padma's daily KVM meeting to only track System Firmware Mustfix issues using the LC GA1 Mustfix label since that is all that applies to the Supermicro team. The OS Kernel/KVM issues will be managed with a spreadsheet tracked by the KVM team and also in the internal slack channel. Removing the Mustfix label.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-07 09:41 EDT-------
Two comments above, it points to a URL that does not match the commit id it is refering to " cpufreq: powernv: Fix hardlockup due to synchronous smp_call in timer interrupt"

The right URL seems to be https://github.com/torvalds/linux/commit/c0f7f5b6c69107ca92909512533e70258ee19188

Changed in linux (Ubuntu):
importance: Undecided → Critical
status: New → Triaged
Changed in linux (Ubuntu Bionic):
importance: Undecided → Critical
status: New → Triaged
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit c0f7f5b6c69107ca92909512533e70258ee19188. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1768898

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-image-unsigned, linux-modules and linux-modules-extra .deb packages.

Thanks in advance!

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-13 00:08 EDT-------
(In reply to comment #32)
> I built a test kernel with commit c0f7f5b6c69107ca92909512533e70258ee19188.
> The test kernel can be downloaded from:
> http://kernel.ubuntu.com/~jsalisbury/lp1768898
>
> Can you test this kernel and see if it resolves this bug?
>
> Note about installing test kernels:
> ? If the test kernel is prior to 4.15(Bionic) you need to install the
> linux-image and linux-image-extra .deb packages.
> ? If the test kernel is 4.15(Bionic) or newer, you need to install the
> linux-image-unsigned, linux-modules and linux-modules-extra .deb packages.
>
> Thanks in advance!

System is having bionic kernel, while installing the above packages getting an error.
root@ltc-boston125:~/test_bug# ls
linux-image-unsigned-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb linux-modules-extra-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb
linux-modules-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb
root@ltc-boston125:~/test_bug# dpkg -i *.deb
Selecting previously unselected package linux-image-unsigned-4.15.0-20-generic.
dpkg: regarding linux-image-unsigned-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb containing linux-image-unsigned-4.15.0-20-generic:
linux-image-unsigned-4.15.0-20-generic conflicts with linux-image-4.15.0-20-generic
linux-image-4.15.0-20-generic (version 4.15.0-20.21) is present and installed.

dpkg: error processing archive linux-image-unsigned-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb (--install):
conflicting packages - not installing linux-image-unsigned-4.15.0-20-generic
dpkg: warning: downgrading linux-modules-4.15.0-20-generic from 4.15.0-20.21 to 4.15.0-20.21~lp1768898
(Reading database ... 67065 files and directories currently installed.)
Preparing to unpack linux-modules-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb ...
Unpacking linux-modules-4.15.0-20-generic (4.15.0-20.21~lp1768898) over (4.15.0-20.21) ...
dpkg: warning: downgrading linux-modules-extra-4.15.0-20-generic from 4.15.0-20.21 to 4.15.0-20.21~lp1768898
Preparing to unpack linux-modules-extra-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb ...
Unpacking linux-modules-extra-4.15.0-20-generic (4.15.0-20.21~lp1768898) over (4.15.0-20.21) ...
Setting up linux-modules-4.15.0-20-generic (4.15.0-20.21~lp1768898) ...
Setting up linux-modules-extra-4.15.0-20-generic (4.15.0-20.21~lp1768898) ...
Processing triggers for linux-image-4.15.0-20-generic (4.15.0-20.21) ...
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-4.15.0-20-generic
W: Possible missing firmware /lib/firmware/ast_dp501_fw.bin for module ast
/etc/kernel/postinst.d/zz-update-grub:
Generating grub configuration file ...
Found linux image: /boot/vmlinux-4.15.0-20-generic
Found initrd image: /boot/initrd.img-4.15.0-20-generic
done
Errors were encountered while processing:
linux-image-unsigned-4.15.0-20-generic_4.15.0-20.21~lp1768898_ppc64el.deb

uname -a
Linux ltc-boston125 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:14:44 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

Can you please let me know why it is failing to install.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-13 07:34 EDT-------
Looks like the kernel was built with a conflicting version number. Beware, your kernel modules have now been tainted and you have an old kernel with newer modules. I would have expected this to be built as a separate kernel version so that you could choose boot either old or new. Normally, we would modify the EXTRAVERSION variable in the Makefile so that it added a unique identifier, such as appending "-LP1768898".

Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
Changed in linux (Ubuntu):
status: Triaged → In Progress
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built one more test kernel that can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1768898

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-image-unsigned, linux-modules and linux-modules-extra .deb packages.

Thanks in advance!

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-15 01:59 EDT-------
(In reply to comment #35)
> I built one more test kernel that can be downloaded from:
> http://kernel.ubuntu.com/~jsalisbury/lp1768898
>
> Can you test this kernel and see if it resolves this bug?
>
> Note about installing test kernels:
> ? If the test kernel is prior to 4.15(Bionic) you need to install the
> linux-image and linux-image-extra .deb packages.
> ? If the test kernel is 4.15(Bionic) or newer, you need to install the
> linux-image-unsigned, linux-modules and linux-modules-extra .deb packages.
>
> Thanks in advance!

Thanks, I have installed the kernel and ran the stress test for 12 hours. system looks good and tests running fine without any issues.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
description: updated
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-22 03:06 EDT-------
*** Bug 167467 has been marked as a duplicate of this bug. ***

Stefan Bader (smb)
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-26 03:34 EDT-------
Linux ltc-boston125 4.15.0-23-generic #25-Ubuntu SMP Wed May 23 17:59:00 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux

Installed & tested the above proposed kernel, and tests ran fine for more than 12 hours, not seen any issues. Thanks.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (11.4 KiB)

This bug was fixed in the package linux - 4.15.0-23.25

---------------
linux (4.15.0-23.25) bionic; urgency=medium

  * linux: 4.15.0-23.25 -proposed tracker (LP: #1772927)

  * arm64 SDEI support needs trampoline code for KPTI (LP: #1768630)
    - arm64: mmu: add the entry trampolines start/end section markers into
      sections.h
    - arm64: sdei: Add trampoline code for remapping the kernel

  * Some PCIe errors not surfaced through rasdaemon (LP: #1769730)
    - ACPI: APEI: handle PCIe AER errors in separate function
    - ACPI: APEI: call into AER handling regardless of severity

  * qla2xxx: Fix page fault at kmem_cache_alloc_node() (LP: #1770003)
    - scsi: qla2xxx: Fix session cleanup for N2N
    - scsi: qla2xxx: Remove unused argument from qlt_schedule_sess_for_deletion()
    - scsi: qla2xxx: Serialize session deletion by using work_lock
    - scsi: qla2xxx: Serialize session free in qlt_free_session_done
    - scsi: qla2xxx: Don't call dma_free_coherent with IRQ disabled.
    - scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
    - scsi: qla2xxx: Prevent relogin trigger from sending too many commands
    - scsi: qla2xxx: Fix double free bug after firmware timeout
    - scsi: qla2xxx: Fixup locking for session deletion

  * Several hisi_sas bug fixes (LP: #1768974)
    - scsi: hisi_sas: dt-bindings: add an property of signal attenuation
    - scsi: hisi_sas: support the property of signal attenuation for v2 hw
    - scsi: hisi_sas: fix the issue of link rate inconsistency
    - scsi: hisi_sas: fix the issue of setting linkrate register
    - scsi: hisi_sas: increase timer expire of internal abort task
    - scsi: hisi_sas: remove unused variable hisi_sas_devices.running_req
    - scsi: hisi_sas: fix return value of hisi_sas_task_prep()
    - scsi: hisi_sas: Code cleanup and minor bug fixes

  * [bionic] machine stuck and bonding not working well when nvmet_rdma module
    is loaded (LP: #1764982)
    - nvmet-rdma: Don't flush system_wq by default during remove_one
    - nvme-rdma: Don't flush delete_wq by default during remove_one

  * Warnings/hang during error handling of SATA disks on SAS controller
    (LP: #1768971)
    - scsi: libsas: defer ata device eh commands to libata

  * Hotplugging a SATA disk into a SAS controller may cause crash (LP: #1768948)
    - ata: do not schedule hot plug if it is a sas host

  * ISST-LTE:pKVM:Ubuntu1804: rcu_sched self-detected stall on CPU follow by CPU
    ATTEMPT TO RE-ENTER FIRMWARE! (LP: #1767927)
    - powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
    - powerpc/64s: return more carefully from sreset NMI
    - powerpc/64s: sreset panic if there is no debugger or crash dump handlers

  * fsnotify: Fix fsnotify_mark_connector race (LP: #1765564)
    - fsnotify: Fix fsnotify_mark_connector race

  * Hang on network interface removal in Xen virtual machine (LP: #1771620)
    - xen-netfront: Fix hang on device removal

  * HiSilicon HNS NIC names are truncated in /proc/interrupts (LP: #1765977)
    - net: hns: Avoid action name truncation

  * Ubuntu 18.04 kernel crashed while in degraded mode (LP: #1770849)
    - SAUCE: powerpc/perf: Fix memory allocation for...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.