4.15.0-22-generic fails to boot on IBM S822LC (POWER8 (raw), altivec supported)

Bug #1773162 reported by Paul Menzel on 2018-05-24
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Critical
Canonical Kernel Team
linux (Ubuntu)
Critical
Joseph Salisbury
Bionic
Critical
Joseph Salisbury

Bug Description

Upgrading from 16.04 to 18.04 on a IBM S822LC the system does not boot with 4.15.0-22-generic and is in a reboot cycle.

```
Exiting petitboot. Type 'exit' to return.
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
[31775213631,3] OPAL: Trying a CPU re-init with flags: 0x1
[32097001879,3] OPAL: Trying a CPU re-init with flags: 0x2
[ 2.660186] Bad kernel stack pointer 7fffcd511100 at c00000000000b9ec
[ 2.660382] Oops: Bad kernel stack pointer, sig: 6 [#1]
[ 2.660422] LE SMP NR_CPUS=2048 NUMA PowerNV
[ 2.660462] Modules linked in:
[ 2.660494] CPU: 63 PID: 1201 Comm: modprobe Not tainted 4.15.0-22-generic #24-Ubuntu
[ 2.660549] NIP: c00000000000b9ec LR: 0000000000000000 CTR: 0000000000000000
[ 2.660603] REGS: c00000003fd0bd40 TRAP: 0300 Not tainted (4.15.0-22-generic)
[ 2.660657] MSR: 9000000000001031 <SF,HV,ME,IR,DR,LE> CR: 00000000 XER: 00000000
[ 2.660713] CFAR: c00000000000b934 DAR: 00000000000200f0 DSISR: 40000000 SOFTE: -4611686018408771536
[ 2.660713] GPR00: 0000000000000000 00007fffcd511100 0000000000000000 0000000000000000
[ 2.660713] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.660713] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.660713] GPR12: 000075501b231700 00000000000200f0 0000000000000000 0000000000000000
[ 2.660713] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.660713] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.660713] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.660713] GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.661203] NIP [c00000000000b9ec] fast_exception_return+0x9c/0x184
[ 2.661249] LR [0000000000000000] (null)
[ 2.661285] Call Trace:
[ 2.661303] Instruction dump:
[ 2.661331] e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070 e8410080 e8610088 e8810090
[ 2.661387] e8210078 7db243a6 7db142a6 7c0004ac <e9ad0000> 63ff0000 7db242a6 48000010
[ 2.661445] ---[ end trace b607b09fe6490607 ]---
[…]
```

Please find the full log attached. Selecting Linux 4.4.0-124-generic works.

```
$ uname -a
Linux flughafenberlinbrandenburgwillybrandt 4.4.0-124-generic #148-Ubuntu SMP Wed May 2 13:02:22 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
$ dpkg -l linux-image-4.15.0-22-generic
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==================================================-==============================-==============================-=========================================================================================================
ii linux-image-4.15.0-22-generic 4.15.0-22.24 ppc64el Signed kernel image generic
$ more /etc/os-release
NAME="Ubuntu"
VERSION="18.04 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
```
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 May 23 16:40 seq
 crw-rw---- 1 root audio 116, 33 May 23 16:40 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7
Architecture: ppc64el
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Package: linux (not installed)
PciMultimedia:

ProcFB: 0 astdrmfb
ProcKernelCmdLine: root=UUID=2c3dd738-785a-469b-843e-9f0ba8b47b0d ro rootflags=subvol=@ quiet splash
ProcLoadAvg: 0.01 0.08 0.04 2/1373 28071
ProcSwaps: Filename Type Size Used Priority
ProcVersion: Linux version 4.4.0-124-generic (buildd@bos02-ppc64el-008) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) ) #148-Ubuntu SMP Wed May 2 13:02:22 UTC 2018
ProcVersionSignature: Ubuntu 4.4.0-124.148-generic 4.4.117
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-124-generic N/A
 linux-backports-modules-4.4.0-124-generic N/A
 linux-firmware 1.173
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic
Uname: Linux 4.4.0-124-generic ppc64le
UpgradeStatus: Upgraded to bionic on 2018-05-23 (0 days ago)
UserGroups: adm edv libvirtd lxd sudo
VarLogDump_list: total 0
_MarkForUpload: True
cpu_cores: Number of cores present = 20
cpu_coreson: Number of cores online = 20
cpu_smt: SMT=8

Paul Menzel (paulmenzel) wrote :

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1773162

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic

apport information

tags: added: apport-collected
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Paul Menzel (paulmenzel) wrote :

With `loglevel=7 initcall_debug nosplash` on the command line I got a little further, I believe.

```
[…]
[ 0.000000] Linux version 4.15.0-22-generic (buildd@bos02-ppc64el-009) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #24-Ubuntu SMP Wed May 16 12:12:37 UTC 2018 (Ubuntu 4.15.0-22.24-generic 4.15.17)
[…]
[ 0.000000] Kernel command line: root=UUID=2c3dd738-785a-469b-843e-9f0ba8b47b0d ro rootflags=subvol=@ loglevel=7 initcall_debug nosplash
[…]
[ 3.290595] Bad kernel stack pointer 7fffc010e3d0 at c00000000000b9ec
[ 3.290639] Oops: Bad kernel stack pointer, sig: 6 [#1]
[ 3.290671] LE SMP NR_CPUS=2048 NUMA PowerNV
[ 3.290705] Modules linked in:
[ 3.290731] CPU: 56 PID: 1200 Comm: modprobe Not tainted 4.15.0-22-generic #24-Ubuntu
[ 3.290778] NIP: c00000000000b9ec LR: 0000000000000000 CTR: 0000000000000000
[ 3.290826] REGS: c00000003fd5fd40 TRAP: 0300 Not tainted (4.15.0-22-generic)
[ 3.290872] MSR: 9000000000001031 <SF,HV,ME,IR,DR,LE> CR: 00000000 XER: 00000000
[ 3.290922] CFAR: c00000000000b934 DAR: 000000000002b200 DSISR: 40000000 SOFTE: -4611686018403131680
[ 3.290922] GPR00: 0000000000000000 00007fffc010e3d0 0000000000000000 0000000000000000
[ 3.290922] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3.290922] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3.290922] GPR12: 000074c38e101700 000000000002b200 0000000000000000 0000000000000000
[ 3.290922] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3.290922] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3.290922] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3.290922] GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3.291350] NIP [c00000000000b9ec] fast_exception_return+0x9c/0x184
[ 3.291389] LR [0000000000000000] (null)
[ 3.291419] Call Trace:
[ 3.291435] Instruction dump:
[ 3.291460] e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070 e8410080 e8610088 e8810090
[ 3.291509] e8210078 7db243a6 7db142a6 7c0004ac <e9ad0000> 63ff0000 7db242a6 63ff0000
[ 3.291560] ---[ end trace 12eb35aa3ef49d58 ]---
```

Please find the full log attached.

Paul Menzel (paulmenzel) wrote :

I was asked to add the firmware details.

```
$ ipmitool … fru print 47
 Product Name : OpenPOWER Firmware
 Product Version : IBM-firestone-ibm-OP8_v1.12_2.85
 Product Extra : op-build-d033e11
 Product Extra : buildroot-81b8d98
 Product Extra : skiboot-5.4.8-7f3e3b0
 Product Extra : hostboot-2eb7706-f28ad92
 Product Extra : linux-4.4.92-openpower1-59284a2
 Product Extra : petitboot-v1.4.4-a6d3938
 Product Extra : firestone-xml-2494a43
 Product Extra : occ-d7efe30-dca9
```

This kernel seems to be missing a change that is present in the upstream version of the patch.

See:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/lib/feature-fixups.c?commit=a048a07d7f4535baa4cbad6bc024f175317ab938#n189

--- a/arch/powerpc/lib/feature-fixups.c 2018-05-24 23:31:52.497801437 +1000
+++ b/arch/powerpc/lib/feature-fixups.c 2018-05-24 23:29:18.141606123 +1000
@@ -185,12 +186,21 @@

  i = 0;
  if (types & STF_BARRIER_FALLBACK || types & STF_BARRIER_SYNC_ORI) {
- instrs[i++] = 0x7db243a6; /* mtsprg 2,r13 */
- instrs[i++] = 0x7db142a6; /* mfsprg r13,1 */
+ if (cpu_has_feature(CPU_FTR_HVMODE)) {
+ instrs[i++] = 0x7db14ba6; /* mtspr 0x131, r13 (HSPRG1) */
+ instrs[i++] = 0x7db04aa6; /* mfspr r13, 0x130 (HSPRG0) */
+ } else {
+ instrs[i++] = 0x7db243a6; /* mtsprg 2,r13 */
+ instrs[i++] = 0x7db142a6; /* mfsprg r13,1 */
+ }
   instrs[i++] = 0x7c0004ac; /* hwsync */
   instrs[i++] = 0xe9ad0000; /* ld r13,0(r13) */
   instrs[i++] = 0x63ff0000; /* ori 31,31,0 speculation barrier */
- instrs[i++] = 0x7db242a6; /* mfsprg r13,2 */
+ if (cpu_has_feature(CPU_FTR_HVMODE)) {
+ instrs[i++] = 0x7db14aa6; /* mfspr r13, 0x131 (HSPRG1) */
+ } else {
+ instrs[i++] = 0x7db242a6; /* mfsprg r13,2 */
+ }
  } else if (types & STF_BARRIER_EIEIO) {
   instrs[i++] = 0x7e0006ac; /* eieio + bit 6 hint */
  }

Paul Menzel (paulmenzel) wrote :

The system boots with `no_stf_barrier`.

Manoj Iyer (manjo) on 2018-05-24
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu):
importance: Undecided → Critical
status: Incomplete → In Progress
assignee: Canonical Kernel Team (canonical-kernel-team) → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Bionic):
importance: Undecided → Critical
status: New → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
tags: added: kernel-key
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with a revert of 06f7e3d39 and a new backport of a048a07d7f4. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1773162

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Paul Menzel (paulmenzel) wrote :

That Linux kernel boots to the login prompt. Thank you.

Changed in ubuntu-power-systems:
status: New → In Progress
tags: added: triage-g
Kalpana S Shetty (kalshett) wrote :

I'm pasting Pavithra comments on testing with fixed kernel:

>>>>> Pavithra's update:
(In reply to comment #8)
> ------- Comment From jsalisbury 2018-05-30 01:19:05 UTC-------
> Did this issue start with the 4.15.0-22 kernel? Was there a prior 4.15
> based kernel that did not exhibit this bug?
>
> Could you also see if this bug happens with the Bionic -proposed or mainline
> kernel:
>
> See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to
> enable and use -proposed.
>
> Mainline can be downloaded from:
> http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17-rc7/

Tested with given kernels below is the summary.

4.15.0-20-generic -> System boots
4.15.0-22-generic -> System fails to boots with oops
4.15.0-23-generic -> System fails to boots with oops
4.15.0-23-generic #26~lp1773162 -> System boots

Thanks,
Pavithra

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Manoj Iyer (manjo) on 2018-06-11
Changed in ubuntu-power-systems:
importance: Undecided → Critical
Manoj Iyer (manjo) on 2018-06-11
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic

Hello IBM,

Could you please verify the fix(es) with the Bionic kernel currently in -proposed?

Thank you.

To post a comment you must log in.