Ubuntu 14.04.3 LTS Crash in notifier_call_chain after boot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Vivid |
Fix Released
|
High
|
Chris J Arges |
Bug Description
SRU Justification:
[Impact]
Users of 3.19 kernel with power8 machines get a kernel crash on boot.
[Test Case]
Boot system.
[Fix]
commit 792f96e9a769b79
--
---Problem Description---
Installed Ubuntu 14.04.3 LTS on Palmetto and its crashing after booting to login.
This happens every time I boot Ubuntu 14.04.3 LTS. I've reinstalled Ubuntu and replaced the hard disk as well and re-installed. Still crashing.
---uname output---
Linux paul40 3.19.0-26-generic #28~14.04.1-Ubuntu SMP Wed Aug 12 14:10:52 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = Palmetto
---System Hang---
Ubuntu OS crashes and cannot access host. Must reboot system
---Steps to Reproduce---
Boot system
Oops output:
[ 33.132376] Unable to handle kernel paging request for data at address 0x200000000000000
[ 33.132565] Faulting instruction address: 0xc0000000000dbc60
[ 33.133422] Oops: Kernel access of bad area, sig: 11 [#1]
[ 33.134410] SMP NR_CPUS=2048 NUMA PowerNV
[ 33.134478] Modules linked in: ast ttm drm_kms_helper joydev mac_hid drm hid_generic usbhid hid syscopyarea sysfillrect sysimgblt i2c_algo_bit ofpart cmdlinepart at24 uio_pdrv_genirq powernv_flash mtd ipmi_powernv powernv_rng opal_prd ipmi_msghandler uio uas usb_storage ahci libahci
[ 33.139112] CPU: 24 PID: 0 Comm: swapper/24 Not tainted 3.19.0-26-generic #28~14.04.1-Ubuntu
[ 33.139943] task: c0000000013cccb0 ti: c000000fff700000 task.ti: c000000001448000
[ 33.141642] NIP: c0000000000dbc60 LR: c0000000000dbd94 CTR: 0000000000000000
[ 33.142605] REGS: c000000fff703980 TRAP: 0300 Not tainted (3.19.0-26-generic)
[ 33.143417] MSR: 9000000000009033 <SF,HV,
[ 33.144244] CFAR: c000000000008468 DAR: 0200000000000000 DSISR: 40000000 SOFTE: 0
GPR00: c0000000000dbd94 c000000fff703c00 c00000000144cc00 c0000000015f03c0
GPR04: 0000000000000007 c0000000015f03b8 ffffffffffffffff 0000000000000000
GPR08: 0000000000000000 0200000000000000 c00000000006c394 9000000000001003
GPR12: 0000000000002200 c00000000fb8d800 0000000000000058 0000000000000000
GPR16: c000000001448000 c000000001448000 c000000001448080 c000000000e9a880
GPR20: c000000001448080 0000000000000001 0000000000000002 0000000000000012
GPR24: c000000f1e432200 0000000000000000 0000000000000000 c0000000015f03b8
GPR28: 0000000000000007 0000000000000000 c0000000015f03c0 ffffffffffffffff
[ 33.157013] NIP [c0000000000dbc60] notifier_
[ 33.157818] LR [c0000000000dbd94] atomic_
[ 33.162090] Call Trace:
[ 33.162845] [c000000fff703c00] [0000000000000008] 0x8 (unreliable)
[ 33.163644] [c000000fff703c50] [c0000000000dbd94] atomic_
[ 33.164647] [c000000fff703c90] [c00000000006f2a8] opal_message_
[ 33.165476] [c000000fff703d00] [c0000000000dbc88] notifier_
[ 33.167007] [c000000fff703d50] [c0000000000dbd94] atomic_
[ 33.167816] [c000000fff703d90] [c00000000006f654] opal_do_
[ 33.172166] [c000000fff703dd0] [c00000000006f6d8] opal_interrupt+
[ 33.172997] [c000000fff703e10] [c0000000001273d0] handle_
[ 33.174507] [c000000fff703ed0] [c000000000127658] handle_
[ 33.175312] [c000000fff703f00] [c00000000012baf4] handle_
[ 33.176124] [c000000fff703f30] [c0000000001265c8] generic_
[ 33.176936] [c000000fff703f60] [c000000000010f10] __do_irq+0x80/0x190
[ 33.182406] [c000000fff703f90] [c00000000002476c] call_do_
[ 33.183258] [c00000000144ba30] [c0000000000110c0] do_IRQ+0xa0/0x120
[ 33.184072] [c00000000144ba90] [c0000000000025d8] hardware_
[ 33.184907] --- interrupt: 501 at arch_local_
[ 33.184907] LR = arch_local_
[ 33.186473] [c00000000144bd80] [c000000f2ae19808] 0xc000000f2ae19808 (unreliable)
[ 33.188024] [c00000000144bda0] [c00000000085d5d8] cpuidle_
[ 33.192695] [c00000000144be00] [c000000000108be8] cpu_startup_
[ 33.193543] [c00000000144bee0] [c00000000000bdb4] rest_init+0xa4/0xc0
[ 33.194327] [c00000000144bf00] [c000000000da3e80] start_kernel+
[ 33.195084] [c00000000144bf90] [c000000000008c6c] start_here_
[ 33.196569] Instruction dump:
[ 33.196619] 7cfd3b78 60000000 60000000 e93e0000 2fa90000 419e00a4 2fbf0000 419e009c
[ 33.197605] 2e3d0000 60000000 60000000 60420000 <e9490000> ebc90008 7d234b78 7f84e378
[ 33.202763] ---[ end trace 71076895a9f126ba ]---
[ 33.202836]
[ 35.203605] Kernel panic - not syncing: Fatal exception in interrupt
[ 35.203727] drm_kms_helper: panic occurred, switching back to text console
[ 35.204692] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
Ah! This is due to notifier chain array overflow while handling opal message. The upstream commit 792f96e fixes this issue.. But what I see is the commit 792f96e has been partially applied to ubuntu 14.04.3 kernel sources. And hence you are seeing this issue.
commit 792f96e9a769b79
Author: Neelesh Gupta <email address hidden>
Date: Wed Feb 11 11:57:06 2015 +0530
powerpc/
Fixes the condition check of incoming message type which can
otherwise shoot beyond the message notifiers head array.
Signed-off-by: Neelesh Gupta <email address hidden>
Reviewed-by: Vasant Hegde <email address hidden>
Reviewed-by: Anshuman Khandual <email address hidden>
Signed-off-by: Benjamin Herrenschmidt <email address hidden>
Below is the hunk from above commit, which is missing from ubuntu 14.04.3:
-------
@@ -354,7 +350,7 @@ static void opal_handle_
type = be32_to_
/* Sanity check */
- if (type > OPAL_MSG_TYPE_MAX) {
+ if (type >= OPAL_MSG_TYPE_MAX) {
}
-------
I just checked. The above hunk can be cleanly applied to ubuntu 14.04.3 kernel sources. We should mirror this bug to ubuntu and ask them to apply above hunk.
Related branches
tags: | added: architecture-ppc64le bugnameltc-129216 severity-high targetmilestone-inin14043 |
affects: | ubuntu → linux (Ubuntu) |
Changed in linux (Ubuntu): | |
assignee: | nobody → Taco Screen team (taco-screen-team) |
Changed in linux (Ubuntu): | |
assignee: | Taco Screen team (taco-screen-team) → Chris J Arges (arges) |
importance: | Undecided → High |
status: | New → In Progress |
Changed in linux (Ubuntu): | |
assignee: | Chris J Arges (arges) → nobody |
Changed in linux (Ubuntu Vivid): | |
assignee: | nobody → Chris J Arges (arges) |
importance: | Undecided → High |
status: | New → In Progress |
Changed in linux (Ubuntu): | |
status: | In Progress → New |
importance: | High → Undecided |
description: | updated |
tags: |
added: severity-critical removed: severity-high |
Changed in linux (Ubuntu Vivid): | |
status: | In Progress → Fix Committed |
tags: | removed: bot-comment bugnameltc-129216 severity-critical |
tags: | added: bugnameltc-129216 severity-critical |
tags: |
added: verification-done-vivid removed: verification-needed-vivid |
tags: |
added: verification-needed-vivid removed: verification-done-vivid |
tags: |
added: verification-done-vivid removed: verification-vivid |
Changed in linux (Ubuntu): | |
status: | New → Fix Released |
Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https:/ /wiki.ubuntu. com/Bugs/ FindRightPackag e. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.
To change the source package that this bug is filed about visit https:/ /bugs.launchpad .net/ubuntu/ +bug/1487085/ +editstatus and add the package name in the text box next to the word Package.
[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]