Activity log for bug #1487085

Date Who What changed Old value New value Message
2015-08-20 15:10:10 bugproxy bug added bug
2015-08-20 15:10:14 bugproxy tags architecture-ppc64le bugnameltc-129216 severity-high targetmilestone-inin14043
2015-08-20 16:19:59 Ubuntu Foundations Team Bug Bot tags architecture-ppc64le bugnameltc-129216 severity-high targetmilestone-inin14043 architecture-ppc64le bot-comment bugnameltc-129216 severity-high targetmilestone-inin14043
2015-08-20 16:49:32 Luciano Chavez affects ubuntu linux (Ubuntu)
2015-08-20 16:49:32 Luciano Chavez linux (Ubuntu): assignee Taco Screen team (taco-screen-team)
2015-08-20 21:02:10 Chris J Arges linux (Ubuntu): assignee Taco Screen team (taco-screen-team) Chris J Arges (arges)
2015-08-20 21:02:13 Chris J Arges linux (Ubuntu): importance Undecided High
2015-08-20 21:02:16 Chris J Arges linux (Ubuntu): status New In Progress
2015-08-20 21:02:20 Chris J Arges nominated for series Ubuntu Vivid
2015-08-20 21:02:20 Chris J Arges bug task added linux (Ubuntu Vivid)
2015-08-20 21:14:03 Chris J Arges linux (Ubuntu): assignee Chris J Arges (arges)
2015-08-20 21:14:04 Chris J Arges linux (Ubuntu Vivid): assignee Chris J Arges (arges)
2015-08-20 21:14:07 Chris J Arges linux (Ubuntu Vivid): importance Undecided High
2015-08-20 21:14:09 Chris J Arges linux (Ubuntu Vivid): status New In Progress
2015-08-20 21:14:13 Chris J Arges linux (Ubuntu): status In Progress New
2015-08-20 21:14:15 Chris J Arges linux (Ubuntu): importance High Undecided
2015-08-20 21:16:21 Chris J Arges description ---Problem Description--- Installed Ubuntu 14.04.3 LTS on Palmetto and its crashing after booting to login. This happens every time I boot Ubuntu 14.04.3 LTS. I've reinstalled Ubuntu and replaced the hard disk as well and re-installed. Still crashing. ---uname output--- Linux paul40 3.19.0-26-generic #28~14.04.1-Ubuntu SMP Wed Aug 12 14:10:52 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux Machine Type = Palmetto ---System Hang--- Ubuntu OS crashes and cannot access host. Must reboot system ---Steps to Reproduce--- Boot system Oops output: [ 33.132376] Unable to handle kernel paging request for data at address 0x200000000000000 [ 33.132565] Faulting instruction address: 0xc0000000000dbc60 [ 33.133422] Oops: Kernel access of bad area, sig: 11 [#1] [ 33.134410] SMP NR_CPUS=2048 NUMA PowerNV [ 33.134478] Modules linked in: ast ttm drm_kms_helper joydev mac_hid drm hid_generic usbhid hid syscopyarea sysfillrect sysimgblt i2c_algo_bit ofpart cmdlinepart at24 uio_pdrv_genirq powernv_flash mtd ipmi_powernv powernv_rng opal_prd ipmi_msghandler uio uas usb_storage ahci libahci [ 33.139112] CPU: 24 PID: 0 Comm: swapper/24 Not tainted 3.19.0-26-generic #28~14.04.1-Ubuntu [ 33.139943] task: c0000000013cccb0 ti: c000000fff700000 task.ti: c000000001448000 [ 33.141642] NIP: c0000000000dbc60 LR: c0000000000dbd94 CTR: 0000000000000000 [ 33.142605] REGS: c000000fff703980 TRAP: 0300 Not tainted (3.19.0-26-generic) [ 33.143417] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28002888 XER: 00000000 [ 33.144244] CFAR: c000000000008468 DAR: 0200000000000000 DSISR: 40000000 SOFTE: 0 GPR00: c0000000000dbd94 c000000fff703c00 c00000000144cc00 c0000000015f03c0 GPR04: 0000000000000007 c0000000015f03b8 ffffffffffffffff 0000000000000000 GPR08: 0000000000000000 0200000000000000 c00000000006c394 9000000000001003 GPR12: 0000000000002200 c00000000fb8d800 0000000000000058 0000000000000000 GPR16: c000000001448000 c000000001448000 c000000001448080 c000000000e9a880 GPR20: c000000001448080 0000000000000001 0000000000000002 0000000000000012 GPR24: c000000f1e432200 0000000000000000 0000000000000000 c0000000015f03b8 GPR28: 0000000000000007 0000000000000000 c0000000015f03c0 ffffffffffffffff [ 33.157013] NIP [c0000000000dbc60] notifier_call_chain+0x70/0x100 [ 33.157818] LR [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60 [ 33.162090] Call Trace: [ 33.162845] [c000000fff703c00] [0000000000000008] 0x8 (unreliable) [ 33.163644] [c000000fff703c50] [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60 [ 33.164647] [c000000fff703c90] [c00000000006f2a8] opal_message_notify+0xa8/0x100 [ 33.165476] [c000000fff703d00] [c0000000000dbc88] notifier_call_chain+0x98/0x100 [ 33.167007] [c000000fff703d50] [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60 [ 33.167816] [c000000fff703d90] [c00000000006f654] opal_do_notifier.part.5+0x74/0xa0 [ 33.172166] [c000000fff703dd0] [c00000000006f6d8] opal_interrupt+0x58/0x70 [ 33.172997] [c000000fff703e10] [c0000000001273d0] handle_irq_event_percpu+0x90/0x2b0 [ 33.174507] [c000000fff703ed0] [c000000000127658] handle_irq_event+0x68/0xd0 [ 33.175312] [c000000fff703f00] [c00000000012baf4] handle_fasteoi_irq+0xe4/0x240 [ 33.176124] [c000000fff703f30] [c0000000001265c8] generic_handle_irq+0x58/0x90 [ 33.176936] [c000000fff703f60] [c000000000010f10] __do_irq+0x80/0x190 [ 33.182406] [c000000fff703f90] [c00000000002476c] call_do_irq+0x14/0x24 [ 33.183258] [c00000000144ba30] [c0000000000110c0] do_IRQ+0xa0/0x120 [ 33.184072] [c00000000144ba90] [c0000000000025d8] hardware_interrupt_common+0x158/0x180 [ 33.184907] --- interrupt: 501 at arch_local_irq_restore+0x5c/0x90 [ 33.184907] LR = arch_local_irq_restore+0x40/0x90 [ 33.186473] [c00000000144bd80] [c000000f2ae19808] 0xc000000f2ae19808 (unreliable) [ 33.188024] [c00000000144bda0] [c00000000085d5d8] cpuidle_enter_state+0xa8/0x260 [ 33.192695] [c00000000144be00] [c000000000108be8] cpu_startup_entry+0x488/0x4e0 [ 33.193543] [c00000000144bee0] [c00000000000bdb4] rest_init+0xa4/0xc0 [ 33.194327] [c00000000144bf00] [c000000000da3e80] start_kernel+0x53c/0x558 [ 33.195084] [c00000000144bf90] [c000000000008c6c] start_here_common+0x20/0xa8 [ 33.196569] Instruction dump: [ 33.196619] 7cfd3b78 60000000 60000000 e93e0000 2fa90000 419e00a4 2fbf0000 419e009c [ 33.197605] 2e3d0000 60000000 60000000 60420000 <e9490000> ebc90008 7d234b78 7f84e378 [ 33.202763] ---[ end trace 71076895a9f126ba ]--- [ 33.202836] [ 35.203605] Kernel panic - not syncing: Fatal exception in interrupt [ 35.203727] drm_kms_helper: panic occurred, switching back to text console [ 35.204692] ---[ end Kernel panic - not syncing: Fatal exception in interrupt Ah! This is due to notifier chain array overflow while handling opal message. The upstream commit 792f96e fixes this issue.. But what I see is the commit 792f96e has been partially applied to ubuntu 14.04.3 kernel sources. And hence you are seeing this issue. commit 792f96e9a769b799a2944e9369e4ea1e467135b2 Author: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Date: Wed Feb 11 11:57:06 2015 +0530 powerpc/powernv: Fix the overflow of OPAL message notifiers head array Fixes the condition check of incoming message type which can otherwise shoot beyond the message notifiers head array. Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Below is the hunk from above commit, which is missing from ubuntu 14.04.3: ------------------------------------------------ @@ -354,7 +350,7 @@ static void opal_handle_message(void) type = be32_to_cpu(msg.msg_type); /* Sanity check */ - if (type > OPAL_MSG_TYPE_MAX) { + if (type >= OPAL_MSG_TYPE_MAX) { pr_warning("%s: Unknown message type: %u\n", __func__, type); return; } ------------------------------------------------ I just checked. The above hunk can be cleanly applied to ubuntu 14.04.3 kernel sources. We should mirror this bug to ubuntu and ask them to apply above hunk. SRU Justification: [Impact] Users of 3.19 kernel with power8 machines get a kernel crash on boot. [Test Case] Boot system. [Fix] commit 792f96e9a769b799a2944e9369e4ea1e467135b2 needed to be backported in addition to d7cf83fcaf1b1668201eae4cdd6e6fe7a2448654. Our 3.19 kernel had a partial backport of the first patch. -- ---Problem Description--- Installed Ubuntu 14.04.3 LTS on Palmetto and its crashing after booting to login. This happens every time I boot Ubuntu 14.04.3 LTS. I've reinstalled Ubuntu and replaced the hard disk as well and re-installed. Still crashing. ---uname output--- Linux paul40 3.19.0-26-generic #28~14.04.1-Ubuntu SMP Wed Aug 12 14:10:52 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux Machine Type = Palmetto ---System Hang---  Ubuntu OS crashes and cannot access host. Must reboot system ---Steps to Reproduce---  Boot system Oops output:  [ 33.132376] Unable to handle kernel paging request for data at address 0x200000000000000     [ 33.132565] Faulting instruction address: 0xc0000000000dbc60     [ 33.133422] Oops: Kernel access of bad area, sig: 11 [#1]     [ 33.134410] SMP NR_CPUS=2048 NUMA PowerNV     [ 33.134478] Modules linked in: ast ttm drm_kms_helper joydev mac_hid drm hid_generic usbhid hid syscopyarea sysfillrect sysimgblt i2c_algo_bit ofpart cmdlinepart at24 uio_pdrv_genirq powernv_flash mtd ipmi_powernv powernv_rng opal_prd ipmi_msghandler uio uas usb_storage ahci libahci     [ 33.139112] CPU: 24 PID: 0 Comm: swapper/24 Not tainted 3.19.0-26-generic #28~14.04.1-Ubuntu     [ 33.139943] task: c0000000013cccb0 ti: c000000fff700000 task.ti: c000000001448000     [ 33.141642] NIP: c0000000000dbc60 LR: c0000000000dbd94 CTR: 0000000000000000     [ 33.142605] REGS: c000000fff703980 TRAP: 0300 Not tainted (3.19.0-26-generic)     [ 33.143417] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28002888 XER: 00000000     [ 33.144244] CFAR: c000000000008468 DAR: 0200000000000000 DSISR: 40000000 SOFTE: 0     GPR00: c0000000000dbd94 c000000fff703c00 c00000000144cc00 c0000000015f03c0     GPR04: 0000000000000007 c0000000015f03b8 ffffffffffffffff 0000000000000000     GPR08: 0000000000000000 0200000000000000 c00000000006c394 9000000000001003     GPR12: 0000000000002200 c00000000fb8d800 0000000000000058 0000000000000000     GPR16: c000000001448000 c000000001448000 c000000001448080 c000000000e9a880     GPR20: c000000001448080 0000000000000001 0000000000000002 0000000000000012     GPR24: c000000f1e432200 0000000000000000 0000000000000000 c0000000015f03b8     GPR28: 0000000000000007 0000000000000000 c0000000015f03c0 ffffffffffffffff     [ 33.157013] NIP [c0000000000dbc60] notifier_call_chain+0x70/0x100     [ 33.157818] LR [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60     [ 33.162090] Call Trace:     [ 33.162845] [c000000fff703c00] [0000000000000008] 0x8 (unreliable)     [ 33.163644] [c000000fff703c50] [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60     [ 33.164647] [c000000fff703c90] [c00000000006f2a8] opal_message_notify+0xa8/0x100     [ 33.165476] [c000000fff703d00] [c0000000000dbc88] notifier_call_chain+0x98/0x100     [ 33.167007] [c000000fff703d50] [c0000000000dbd94] atomic_notifier_call_chain+0x44/0x60     [ 33.167816] [c000000fff703d90] [c00000000006f654] opal_do_notifier.part.5+0x74/0xa0     [ 33.172166] [c000000fff703dd0] [c00000000006f6d8] opal_interrupt+0x58/0x70     [ 33.172997] [c000000fff703e10] [c0000000001273d0] handle_irq_event_percpu+0x90/0x2b0     [ 33.174507] [c000000fff703ed0] [c000000000127658] handle_irq_event+0x68/0xd0     [ 33.175312] [c000000fff703f00] [c00000000012baf4] handle_fasteoi_irq+0xe4/0x240     [ 33.176124] [c000000fff703f30] [c0000000001265c8] generic_handle_irq+0x58/0x90     [ 33.176936] [c000000fff703f60] [c000000000010f10] __do_irq+0x80/0x190     [ 33.182406] [c000000fff703f90] [c00000000002476c] call_do_irq+0x14/0x24     [ 33.183258] [c00000000144ba30] [c0000000000110c0] do_IRQ+0xa0/0x120     [ 33.184072] [c00000000144ba90] [c0000000000025d8] hardware_interrupt_common+0x158/0x180     [ 33.184907] --- interrupt: 501 at arch_local_irq_restore+0x5c/0x90     [ 33.184907] LR = arch_local_irq_restore+0x40/0x90     [ 33.186473] [c00000000144bd80] [c000000f2ae19808] 0xc000000f2ae19808 (unreliable)     [ 33.188024] [c00000000144bda0] [c00000000085d5d8] cpuidle_enter_state+0xa8/0x260     [ 33.192695] [c00000000144be00] [c000000000108be8] cpu_startup_entry+0x488/0x4e0     [ 33.193543] [c00000000144bee0] [c00000000000bdb4] rest_init+0xa4/0xc0     [ 33.194327] [c00000000144bf00] [c000000000da3e80] start_kernel+0x53c/0x558     [ 33.195084] [c00000000144bf90] [c000000000008c6c] start_here_common+0x20/0xa8     [ 33.196569] Instruction dump:     [ 33.196619] 7cfd3b78 60000000 60000000 e93e0000 2fa90000 419e00a4 2fbf0000 419e009c     [ 33.197605] 2e3d0000 60000000 60000000 60420000 <e9490000> ebc90008 7d234b78 7f84e378     [ 33.202763] ---[ end trace 71076895a9f126ba ]---     [ 33.202836]     [ 35.203605] Kernel panic - not syncing: Fatal exception in interrupt     [ 35.203727] drm_kms_helper: panic occurred, switching back to text console     [ 35.204692] ---[ end Kernel panic - not syncing: Fatal exception in interrupt Ah! This is due to notifier chain array overflow while handling opal message. The upstream commit 792f96e fixes this issue.. But what I see is the commit 792f96e has been partially applied to ubuntu 14.04.3 kernel sources. And hence you are seeing this issue. commit 792f96e9a769b799a2944e9369e4ea1e467135b2 Author: Neelesh Gupta <neelegup@linux.vnet.ibm.com> Date: Wed Feb 11 11:57:06 2015 +0530     powerpc/powernv: Fix the overflow of OPAL message notifiers head array     Fixes the condition check of incoming message type which can     otherwise shoot beyond the message notifiers head array.     Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>     Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>     Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>     Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Below is the hunk from above commit, which is missing from ubuntu 14.04.3: ------------------------------------------------ @@ -354,7 +350,7 @@ static void opal_handle_message(void)         type = be32_to_cpu(msg.msg_type);         /* Sanity check */ - if (type > OPAL_MSG_TYPE_MAX) { + if (type >= OPAL_MSG_TYPE_MAX) {                 pr_warning("%s: Unknown message type: %u\n", __func__, type);                 return;         } ------------------------------------------------ I just checked. The above hunk can be cleanly applied to ubuntu 14.04.3 kernel sources. We should mirror this bug to ubuntu and ask them to apply above hunk.
2015-08-25 13:40:27 bugproxy tags architecture-ppc64le bot-comment bugnameltc-129216 severity-high targetmilestone-inin14043 architecture-ppc64le bot-comment bugnameltc-129216 severity-critical targetmilestone-inin14043
2015-08-25 15:33:44 Brad Figg linux (Ubuntu Vivid): status In Progress Fix Committed
2015-09-02 16:11:49 bugproxy tags architecture-ppc64le bot-comment bugnameltc-129216 severity-critical targetmilestone-inin14043 architecture-ppc64le targetmilestone-inin14043
2015-09-03 03:20:32 bugproxy tags architecture-ppc64le targetmilestone-inin14043 architecture-ppc64le bugnameltc-129216 severity-critical targetmilestone-inin14043
2015-09-11 18:19:47 Launchpad Janitor branch linked lp:ubuntu/trusty-proposed/linux-lts-vivid
2015-09-13 22:38:29 Brad Figg tags architecture-ppc64le bugnameltc-129216 severity-critical targetmilestone-inin14043 architecture-ppc64le bugnameltc-129216 severity-critical targetmilestone-inin14043 verification-needed-vivid
2015-09-16 19:38:52 Breno Leitão tags architecture-ppc64le bugnameltc-129216 severity-critical targetmilestone-inin14043 verification-needed-vivid architecture-ppc64le bugnameltc-129216 severity-critical targetmilestone-inin14043 verification-done-vivid
2015-09-16 19:50:52 bugproxy tags architecture-ppc64le bugnameltc-129216 severity-critical targetmilestone-inin14043 verification-done-vivid architecture-ppc64le bugnameltc-129216 severity-critical targetmilestone-inin14043 verification-needed-vivid
2015-09-18 13:37:43 Breno Leitão tags architecture-ppc64le bugnameltc-129216 severity-critical targetmilestone-inin14043 verification-needed-vivid architecture-ppc64le bugnameltc-129216 severity-critical targetmilestone-inin14043 verification-vivid
2015-09-18 14:30:50 bugproxy tags architecture-ppc64le bugnameltc-129216 severity-critical targetmilestone-inin14043 verification-vivid architecture-ppc64le bugnameltc-129216 severity-critical targetmilestone-inin14043 verification-done-vivid
2015-09-21 17:25:50 Chris J Arges linux (Ubuntu): status New Fix Released
2015-09-28 20:15:56 Launchpad Janitor linux (Ubuntu Vivid): status Fix Committed Fix Released