Ubuntu 16.04.03(P8/Tuleta): SEGV-panic in smp_send_reschedule(). Machine keeps rebooting with oops message even after reboot.

Bug #1788782 reported by bugproxy on 2018-08-24
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Medium
Canonical Kernel Team
linux (Ubuntu)
Medium
Canonical Kernel Team
Bionic
Medium
Canonical Kernel Team

Bug Description

== Comment: #0 - PAVITHRA R. PRAKASH <email address hidden> - 2017-07-25 06:18:00 ==
---Problem Description---

Ubuntu 16.04.03: Fadump fails when dump is triggered after dlpar operation. Machine keeps rebooting with oops message even after reboot.

---Environment--
Kernel Build: Ubuntu 16.04.03
System Name : Tuleta
Model/Type : P8
Platform : LPAR

---Uname output---

root@tuleta4u-lp9:/home/ubuntu# uname -a
Linux tuleta4u-lp9 4.10.0-28-generic #32~16.04.2-Ubuntu SMP Thu Jul 20 10:17:50 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

---Steps to reproduce--
1. Configure fadump.
2. Add memory with dlpar operation.
3. Remove memory with dlpar operation.
4. Check fadump service is up.
5. Trigger crash

---Logs----

Attaching full console log.

[ OK ] Reached target Remote File Systems.
         Starting LSB: automatic crash report generation...
         Starting LSB: Set the CPU Frequency Scaling governor to "ondemand"...
         Starting LSB: Load kernel image with kexec...
         Starting LSB: daemon to balance interrupts for SMP systems...
         Starting Permit User Sessions...
[ 10.997932] Unable to handle kernel paging request for data at address 0xa0000000
[ 10.997948] Faulting instruction address: 0xc0000000000459f4
[ 10.997956] Oops: Kernel access of bad area, sig: 11 [#1]
[ 10.997960] SMP NR_CPUS=2048
[ 10.997961] NUMA
[ 10.997965] pSeries
[ 10.997971] Modules linked in: binfmt_misc vmx_crypto pseries_rng ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid1 raid0 multipath linear raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c ibmvscsi crc32c_vpmsum
[ 10.998013] CPU: 5 PID: 2212 Comm: kdump-config Not tainted 4.10.0-28-generic #32~16.04.2-Ubuntu
[ 10.998020] task: c0000000fd8b8c00 task.stack: c0000000fc364000
[ 10.998026] NIP: c0000000000459f4 LR: c000000000127628 CTR: c000000000141390
[ 10.998031] REGS: c0000000fc367a60 TRAP: 0300 Not tainted (4.10.0-28-generic)
[ 10.998037] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
[ 10.998046] CR: 28222824 XER: 00000000
[ 10.998053] CFAR: c000000000008860 DAR: 00000000a0000000 DSISR: 40000000 SOFTE: 0
[ 10.998053] GPR00: c000000000127628 c0000000fc367ce0 c0000000014ad100 0000000000000007
[ 10.998053] GPR04: c000001fd47e5800 0000000000000002 0000000000000001 0000000000000800
[ 10.998053] GPR08: 0000000000000804 00000000a0000000 0000000000000000 0000000000000000
[ 10.998053] GPR12: 0000000028222242 c00000000fb82d00 0000000022000000 00000100012ab908
[ 10.998053] GPR16: 0000000000000000 0000000000000001 0000000000000000 0000000000000000
[ 10.998053] GPR20: 0000000000000060 0000000000000000 0000000000000000 0000000000000000
[ 10.998053] GPR24: 0000000000000000 c000001fd489d980 c000001fd47e5880 c0000000fd934b00
[ 10.998053] GPR28: c000001fd47e6044 c000001f4fee3280 0000000000000007 c000001f4fee3280
[ 10.998127] NIP [c0000000000459f4] smp_send_reschedule+0x24/0x80
[ 10.998135] LR [c000000000127628] resched_curr+0x168/0x190
[ 10.998139] Call Trace:
[ 10.998143] [c0000000fc367ce0] [c000000000127628] resched_curr+0x168/0x190 (unreliable)
[ 10.998152] [c0000000fc367d10] [c000000000128728] check_preempt_curr+0xc8/0xf0
[ 10.998159] [c0000000fc367d40] [c00000000012b3bc] wake_up_new_task+0x16c/0x2d0
[ 10.998167] [c0000000fc367da0] [c0000000000e7304] _do_fork+0x174/0x520
[ 10.998175] [c0000000fc367e30] [c00000000000b410] ppc_clone+0x8/0xc
[ 10.998180] Instruction dump:
[ 10.998185] 60000000 60000000 60420000 3c4c0146 38427730 7c0802a6 f8010010 60000000
[ 10.998196] 3d220006 e9297bc0 2fa90000 4d9e0020 <e9290000> 2fa90000 419e0044 7c0802a6
[ 10.998210] ---[ end trace 7ad373050ad8891c ]---
[ 11.003011]

== Comment: #1 - PAVITHRA R. PRAKASH <email address hidden> - 2017-07-25 06:24:46 ==

== Comment: #8 - PAVITHRA R. PRAKASH <email address hidden> - 2017-08-29 08:35:18 ==
I could not recreate the issue mentioned in bug, But machine is going in to Error state after below steps.

1. Activate the partition with 130GB.
2. Add 20GB.
3. Remove 20GB.
4. Trigger fadump.

Thanks,
Pavithra

== Comment: #9 - Hari Krishna Bathini <email address hidden> - 2017-08-29 11:03:53 ==
(In reply to comment #8)
> I could not recreate the issue mentioned in bug, But machine is going in to
> Error state after below steps.
>
> 1. Activate the partition with 130GB.
> 2. Add 20GB.
> 3. Remove 20GB.
> 4. Trigger fadump.
>

Reference Code indicates copy error (B200541A). The fix for this problem is
to restart kdump-tools service after DLPAR operations being tracked via
bug 150355.

Thanks
Hari

== Comment: #12 - PAVITHRA R. PRAKASH <email address hidden> - 2017-09-01 01:54:49 ==

== Comment: #16 - Hari Krishna Bathini <email address hidden> - 2018-08-23 11:51:34 ==
The below patches are needed to fix this issue.

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=1bd6a1c4b80a28d975287630644e6b
("powerpc/fadump: handle crash memory ranges array index overflow")

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=ced1bf52f47783135b985d2aacf53f
("powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements")

https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?id=a58183138cb72059a0c278f8370a47
("powerpc/fadump: cleanup crash memory ranges support")

Thanks
Hari

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-156961 severity-high targetmilestone-inin16045
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Changed in ubuntu-power-systems:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu):
importance: Undecided → High
status: New → Triaged
Changed in linux (Ubuntu):
status: Triaged → In Progress
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Bionic):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commits 1bd6a1c, ced1bf5 and a581831. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1788782

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Changed in ubuntu-power-systems:
status: Triaged → In Progress
Andrew Cloke (andrew-cloke) wrote :

Marking as "Incomplete" while waiting for test kernel results to be posted.

Changed in ubuntu-power-systems:
status: In Progress → Incomplete
tags: added: powervm
Changed in linux (Ubuntu):
status: In Progress → Incomplete
Changed in linux (Ubuntu Bionic):
status: In Progress → Incomplete
Manoj Iyer (manjo) wrote :

Since this bug is lacking any updates in the last couple months (test kernel was posted on 8/27) I am lowering our priority on this bug to medium.

Changed in ubuntu-power-systems:
importance: High → Medium
Changed in linux (Ubuntu):
importance: High → Medium
Changed in linux (Ubuntu Bionic):
importance: High → Medium
Changed in linux (Ubuntu):
assignee: Joseph Salisbury (jsalisbury) → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Bionic):
assignee: Joseph Salisbury (jsalisbury) → Canonical Kernel Team (canonical-kernel-team)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers