Ubuntu 16.10 netboot install fails with "Oops: Exception in kernel mode, sig: 5 [#1] " (lpfc)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
debian-installer (Ubuntu) |
Fix Released
|
Undecided
|
Adam Conrad | ||
Xenial |
Fix Released
|
Undecided
|
Adam Conrad | ||
Yakkety |
Fix Released
|
Undecided
|
Adam Conrad | ||
linux (Ubuntu) |
Fix Released
|
High
|
Canonical Kernel Team | ||
Xenial |
Invalid
|
Undecided
|
Unassigned | ||
Yakkety |
Fix Released
|
Undecided
|
Tim Gardner |
Bug Description
== Comment: #33 - Mauricio Faria De Oliveira - 2016-12-09 06:49:57 ==
Hi Canonical,
Can you please apply this patch [1] to 16.10 and 16.04.x HWE (4.8) ?
It's fixes a regression introduced in 4.8.
As you can see, it's in the SCSI maintainer (James Bottomley)'s 'fixes' branch, but didn't make 4.9-rc8 (maybe he considered it late for this one).
We have installer, boot, and post-boot problems due to this one.
It'd be good if the netboot images for 16.04.x HWE kernel can get it too.
Thank you,
[1] scsi: lpfc: fix oops/BUG in lpfc_sli_
http://
Historical context:
== Comment: #0 - HARSHA THYAGARAJA - 2016-11-21 02:39:35 ==
---Problem Description---
Ubuntu 16.10 netboot install fails with "Oops: Exception in kernel mode. " (kernel: 4.8.0-27)
Machine Type = Power8 baremetal
---boot type---
QEMU direct boot kernel/initrd
---Kernel cmdline used to launch install---
On a Power8 server, Using kernel and initrd images,
netcfg/
---Install repository type---
Internet repository
---Install repository Location---
http://
---Point of failure---
Other failure during installation (stage 1)
== Comment: #1 - HARSHA THYAGARAJA - 2016-11-21 02:41:54 ==
The netboot install fails and Call traces are seen at the Disk detection step.
== Comment: #8 - Mauricio Faria De Oliveira - 2016-11-21 15:58:25 ==
Finally got it.
The assembly offset/code + the trap signal is due to this BUG_ON(), and the second condition triggered the trap.
Checking why piocb is not NULL but piocb->vport is NULL.
This might have happened in the lpfc_linkdown_
Would need a more readable console log (ie, dmesg, as requested in comments 5, 3) to help understanding it.
--
static int
lpfc_sli_
{
<...>
}
[ 226.147886] NIP [d00000000b7324c0] lpfc_sli_
0x2478 + 0x48 = 0x24c0 (tdnei; trap doubleword not equal immediate)
0000000000002478 <lpfc_sli_
<...>
2498: 78 2b bf 7c mr r31,r5 // r31 is *piocb (r5 is the 3rd function parameter)
249c: 78 1b 7d 7c mr r29,r3
24a0: 78 23 9e 7c mr r30,r4
24a4: 01 00 00 48 bl 24a4 <lpfc_sli_
24a8: 00 00 00 60 nop
24ac: 00 00 bf 2f cmpdi cr7,r31,0 // compare piocb with 0. checking for NULL.
24b0: 70 00 9e 41 beq cr7,2520 <lpfc_sli_
24b4: e8 00 3f e9 ld r9,232(r31) // an offset of piocb. probably piocb->vport in the bug_on
24b8: 74 00 29 7d cntlzd r9,r9 // count leading zeroes. if r59 is null (0), leading zeroes is 64 (binary: 0100_0000, bit 6 is 1, and 6 LSbs [bits 5-0] are 0)
24bc: 82 d1 29 79 rldicl r9,r9,58,6 // rotate left 58 (ie, those 6 LSbs are now MSbs, and that bit 6 from 64 was rotated in the register and is now bit 0, the LSb), now AND the 6 MSbs w/ 0-bits, and the all lower bits with 1-bits (ie, save the LSb).
24c0: 00 00 09 0b tdnei r9,0 // trap if not equal to zero. (ie, the whole r9 was zero, with 64 leading/consecutive zeroes, then bit 6 is 1, it becomes bit 0.. and since bit 0 is now 1, r9 is thus non-zero, and the trap triggers.) this checked the latter part of the OR.
tags: | added: architecture-ppc64le bugnameltc-148978 severity-critical targetmilestone-inin1610 |
Changed in ubuntu: | |
assignee: | nobody → Taco Screen team (taco-screen-team) |
affects: | ubuntu → linux (Ubuntu) |
Changed in linux (Ubuntu): | |
assignee: | Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team) |
importance: | Undecided → High |
status: | New → Triaged |
Changed in debian-installer (Ubuntu Xenial): | |
assignee: | nobody → Adam Conrad (adconrad) |
Changed in debian-installer (Ubuntu Yakkety): | |
assignee: | nobody → Adam Conrad (adconrad) |
Changed in linux (Ubuntu Xenial): | |
status: | New → Invalid |
Changed in debian-installer (Ubuntu): | |
assignee: | nobody → Adam Conrad (adconrad) |
Changed in linux (Ubuntu Yakkety): | |
assignee: | nobody → Canonical Kernel Team (canonical-kernel-team) |
Changed in linux (Ubuntu Yakkety): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-yakkety removed: verification-needed-yakkety |
Changed in debian-installer (Ubuntu Xenial): | |
status: | New → Fix Released |
Changed in debian-installer (Ubuntu Yakkety): | |
status: | New → Fix Released |
Changed in debian-installer (Ubuntu): | |
status: | New → Fix Released |
tags: | added: cscc |
------- Comment From <email address hidden> 2016-12-12 11:19 EDT-------
Canonical,
The patch has been accepted into mainline/4.9 [1].
Submitting to kernel-team mailing list a while ago, but not in the archives yet.
Updated netboot files (lpfc module) required.
Thanks!
[1] scsi: lpfc: fix oops/BUG in lpfc_sli_ ringtxcmpl_ put() /git.kernel. org/cgit/ linux/kernel/ git/torvalds/ linux.git/ commit/ drivers/ scsi/lpfc? id=2319f847a891 0cff1d46c9b66aa 1dd7cc3e836a9
https:/
[2] subject: "[SRU][Xenial HWE 4.8][Yakkety] [PATCH] scsi: lpfc: fix oops/BUG in lpfc_sli_ ringtxcmpl_ put()"