4.15.0-22-generic fails to boot on IBM S822LC (POWER8 (raw), altivec supported)

Bug #1773162 reported by Paul Menzel on 2018-05-24
34
This bug affects 4 people
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Critical
Canonical Kernel Team
debian-installer (Ubuntu)
Critical
Canonical Foundations Team
Bionic
Critical
Canonical Foundations Team
linux (Ubuntu)
Critical
Joseph Salisbury
Bionic
Critical
Joseph Salisbury

Bug Description

Upgrading from 16.04 to 18.04 on a IBM S822LC the system does not boot with 4.15.0-22-generic and is in a reboot cycle.

```
Exiting petitboot. Type 'exit' to return.
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
[31775213631,3] OPAL: Trying a CPU re-init with flags: 0x1
[32097001879,3] OPAL: Trying a CPU re-init with flags: 0x2
[ 2.660186] Bad kernel stack pointer 7fffcd511100 at c00000000000b9ec
[ 2.660382] Oops: Bad kernel stack pointer, sig: 6 [#1]
[ 2.660422] LE SMP NR_CPUS=2048 NUMA PowerNV
[ 2.660462] Modules linked in:
[ 2.660494] CPU: 63 PID: 1201 Comm: modprobe Not tainted 4.15.0-22-generic #24-Ubuntu
[ 2.660549] NIP: c00000000000b9ec LR: 0000000000000000 CTR: 0000000000000000
[ 2.660603] REGS: c00000003fd0bd40 TRAP: 0300 Not tainted (4.15.0-22-generic)
[ 2.660657] MSR: 9000000000001031 <SF,HV,ME,IR,DR,LE> CR: 00000000 XER: 00000000
[ 2.660713] CFAR: c00000000000b934 DAR: 00000000000200f0 DSISR: 40000000 SOFTE: -4611686018408771536
[ 2.660713] GPR00: 0000000000000000 00007fffcd511100 0000000000000000 0000000000000000
[ 2.660713] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.660713] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.660713] GPR12: 000075501b231700 00000000000200f0 0000000000000000 0000000000000000
[ 2.660713] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.660713] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.660713] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.660713] GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 2.661203] NIP [c00000000000b9ec] fast_exception_return+0x9c/0x184
[ 2.661249] LR [0000000000000000] (null)
[ 2.661285] Call Trace:
[ 2.661303] Instruction dump:
[ 2.661331] e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070 e8410080 e8610088 e8810090
[ 2.661387] e8210078 7db243a6 7db142a6 7c0004ac <e9ad0000> 63ff0000 7db242a6 48000010
[ 2.661445] ---[ end trace b607b09fe6490607 ]---
[…]
```

Please find the full log attached. Selecting Linux 4.4.0-124-generic works.

```
$ uname -a
Linux flughafenberlinbrandenburgwillybrandt 4.4.0-124-generic #148-Ubuntu SMP Wed May 2 13:02:22 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux
$ dpkg -l linux-image-4.15.0-22-generic
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==================================================-==============================-==============================-=========================================================================================================
ii linux-image-4.15.0-22-generic 4.15.0-22.24 ppc64el Signed kernel image generic
$ more /etc/os-release
NAME="Ubuntu"
VERSION="18.04 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
```
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 May 23 16:40 seq
 crw-rw---- 1 root audio 116, 33 May 23 16:40 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7
Architecture: ppc64el
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Package: linux (not installed)
PciMultimedia:

ProcFB: 0 astdrmfb
ProcKernelCmdLine: root=UUID=2c3dd738-785a-469b-843e-9f0ba8b47b0d ro rootflags=subvol=@ quiet splash
ProcLoadAvg: 0.01 0.08 0.04 2/1373 28071
ProcSwaps: Filename Type Size Used Priority
ProcVersion: Linux version 4.4.0-124-generic (buildd@bos02-ppc64el-008) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) ) #148-Ubuntu SMP Wed May 2 13:02:22 UTC 2018
ProcVersionSignature: Ubuntu 4.4.0-124.148-generic 4.4.117
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-124-generic N/A
 linux-backports-modules-4.4.0-124-generic N/A
 linux-firmware 1.173
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic
Uname: Linux 4.4.0-124-generic ppc64le
UpgradeStatus: Upgraded to bionic on 2018-05-23 (0 days ago)
UserGroups: adm edv libvirtd lxd sudo
VarLogDump_list: total 0
_MarkForUpload: True
cpu_cores: Number of cores present = 20
cpu_coreson: Number of cores online = 20
cpu_smt: SMT=8

CVE References

Paul Menzel (paulmenzel) wrote :

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1773162

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic

apport information

tags: added: apport-collected
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Paul Menzel (paulmenzel) wrote :

With `loglevel=7 initcall_debug nosplash` on the command line I got a little further, I believe.

```
[…]
[ 0.000000] Linux version 4.15.0-22-generic (buildd@bos02-ppc64el-009) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #24-Ubuntu SMP Wed May 16 12:12:37 UTC 2018 (Ubuntu 4.15.0-22.24-generic 4.15.17)
[…]
[ 0.000000] Kernel command line: root=UUID=2c3dd738-785a-469b-843e-9f0ba8b47b0d ro rootflags=subvol=@ loglevel=7 initcall_debug nosplash
[…]
[ 3.290595] Bad kernel stack pointer 7fffc010e3d0 at c00000000000b9ec
[ 3.290639] Oops: Bad kernel stack pointer, sig: 6 [#1]
[ 3.290671] LE SMP NR_CPUS=2048 NUMA PowerNV
[ 3.290705] Modules linked in:
[ 3.290731] CPU: 56 PID: 1200 Comm: modprobe Not tainted 4.15.0-22-generic #24-Ubuntu
[ 3.290778] NIP: c00000000000b9ec LR: 0000000000000000 CTR: 0000000000000000
[ 3.290826] REGS: c00000003fd5fd40 TRAP: 0300 Not tainted (4.15.0-22-generic)
[ 3.290872] MSR: 9000000000001031 <SF,HV,ME,IR,DR,LE> CR: 00000000 XER: 00000000
[ 3.290922] CFAR: c00000000000b934 DAR: 000000000002b200 DSISR: 40000000 SOFTE: -4611686018403131680
[ 3.290922] GPR00: 0000000000000000 00007fffc010e3d0 0000000000000000 0000000000000000
[ 3.290922] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3.290922] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3.290922] GPR12: 000074c38e101700 000000000002b200 0000000000000000 0000000000000000
[ 3.290922] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3.290922] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3.290922] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3.290922] GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 3.291350] NIP [c00000000000b9ec] fast_exception_return+0x9c/0x184
[ 3.291389] LR [0000000000000000] (null)
[ 3.291419] Call Trace:
[ 3.291435] Instruction dump:
[ 3.291460] e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070 e8410080 e8610088 e8810090
[ 3.291509] e8210078 7db243a6 7db142a6 7c0004ac <e9ad0000> 63ff0000 7db242a6 63ff0000
[ 3.291560] ---[ end trace 12eb35aa3ef49d58 ]---
```

Please find the full log attached.

Paul Menzel (paulmenzel) wrote :

I was asked to add the firmware details.

```
$ ipmitool … fru print 47
 Product Name : OpenPOWER Firmware
 Product Version : IBM-firestone-ibm-OP8_v1.12_2.85
 Product Extra : op-build-d033e11
 Product Extra : buildroot-81b8d98
 Product Extra : skiboot-5.4.8-7f3e3b0
 Product Extra : hostboot-2eb7706-f28ad92
 Product Extra : linux-4.4.92-openpower1-59284a2
 Product Extra : petitboot-v1.4.4-a6d3938
 Product Extra : firestone-xml-2494a43
 Product Extra : occ-d7efe30-dca9
```

This kernel seems to be missing a change that is present in the upstream version of the patch.

See:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/lib/feature-fixups.c?commit=a048a07d7f4535baa4cbad6bc024f175317ab938#n189

--- a/arch/powerpc/lib/feature-fixups.c 2018-05-24 23:31:52.497801437 +1000
+++ b/arch/powerpc/lib/feature-fixups.c 2018-05-24 23:29:18.141606123 +1000
@@ -185,12 +186,21 @@

  i = 0;
  if (types & STF_BARRIER_FALLBACK || types & STF_BARRIER_SYNC_ORI) {
- instrs[i++] = 0x7db243a6; /* mtsprg 2,r13 */
- instrs[i++] = 0x7db142a6; /* mfsprg r13,1 */
+ if (cpu_has_feature(CPU_FTR_HVMODE)) {
+ instrs[i++] = 0x7db14ba6; /* mtspr 0x131, r13 (HSPRG1) */
+ instrs[i++] = 0x7db04aa6; /* mfspr r13, 0x130 (HSPRG0) */
+ } else {
+ instrs[i++] = 0x7db243a6; /* mtsprg 2,r13 */
+ instrs[i++] = 0x7db142a6; /* mfsprg r13,1 */
+ }
   instrs[i++] = 0x7c0004ac; /* hwsync */
   instrs[i++] = 0xe9ad0000; /* ld r13,0(r13) */
   instrs[i++] = 0x63ff0000; /* ori 31,31,0 speculation barrier */
- instrs[i++] = 0x7db242a6; /* mfsprg r13,2 */
+ if (cpu_has_feature(CPU_FTR_HVMODE)) {
+ instrs[i++] = 0x7db14aa6; /* mfspr r13, 0x131 (HSPRG1) */
+ } else {
+ instrs[i++] = 0x7db242a6; /* mfsprg r13,2 */
+ }
  } else if (types & STF_BARRIER_EIEIO) {
   instrs[i++] = 0x7e0006ac; /* eieio + bit 6 hint */
  }

Paul Menzel (paulmenzel) wrote :

The system boots with `no_stf_barrier`.

Manoj Iyer (manjo) on 2018-05-24
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu):
importance: Undecided → Critical
status: Incomplete → In Progress
assignee: Canonical Kernel Team (canonical-kernel-team) → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Bionic):
importance: Undecided → Critical
status: New → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
tags: added: kernel-key
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with a revert of 06f7e3d39 and a new backport of a048a07d7f4. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1773162

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Paul Menzel (paulmenzel) wrote :

That Linux kernel boots to the login prompt. Thank you.

Changed in ubuntu-power-systems:
status: New → In Progress
tags: added: triage-g
Kalpana S Shetty (kalshett) wrote :

I'm pasting Pavithra comments on testing with fixed kernel:

>>>>> Pavithra's update:
(In reply to comment #8)
> ------- Comment From jsalisbury 2018-05-30 01:19:05 UTC-------
> Did this issue start with the 4.15.0-22 kernel? Was there a prior 4.15
> based kernel that did not exhibit this bug?
>
> Could you also see if this bug happens with the Bionic -proposed or mainline
> kernel:
>
> See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to
> enable and use -proposed.
>
> Mainline can be downloaded from:
> http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.17-rc7/

Tested with given kernels below is the summary.

4.15.0-20-generic -> System boots
4.15.0-22-generic -> System fails to boots with oops
4.15.0-23-generic -> System fails to boots with oops
4.15.0-23-generic #26~lp1773162 -> System boots

Thanks,
Pavithra

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Manoj Iyer (manjo) on 2018-06-11
Changed in ubuntu-power-systems:
importance: Undecided → Critical
Manoj Iyer (manjo) on 2018-06-11
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic

Hello IBM,

Could you please verify the fix(es) with the Bionic kernel currently in -proposed?

Thank you.

bugproxy (bugproxy) on 2018-06-29
tags: added: architecture-ppc64le bugnameltc-169340 severity-critical targetmilestone-inin---
removed: verification-needed-bionic

We have verified the kernel in -proposed (4.15.0-24.26) to boot correctly in a Power8 system.

tags: added: verification-done-bionic

------- Comment From <email address hidden> 2018-06-29 07:34 EDT-------
Hi Kalpana/ Pavithra,
I see your comments on the external bugzilla about the verification of bionic kernel for this issue.
Can you please confirm if this can be moved to resolved state from the testing perspective?

Thank you.

Launchpad Janitor (janitor) wrote :
Download full text (49.5 KiB)

This bug was fixed in the package linux - 4.15.0-24.26

---------------
linux (4.15.0-24.26) bionic; urgency=medium

  * linux: 4.15.0-24.26 -proposed tracker (LP: #1776338)

  * Bionic update: upstream stable patchset 2018-06-06 (LP: #1775483)
    - drm: bridge: dw-hdmi: Fix overflow workaround for Amlogic Meson GX SoCs
    - i40e: Fix attach VF to VM issue
    - tpm: cmd_ready command can be issued only after granting locality
    - tpm: tpm-interface: fix tpm_transmit/_cmd kdoc
    - tpm: add retry logic
    - Revert "ath10k: send (re)assoc peer command when NSS changed"
    - bonding: do not set slave_dev npinfo before slave_enable_netpoll in
      bond_enslave
    - ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy
    - ipv6: sr: fix NULL pointer dereference in seg6_do_srh_encap()- v4 pkts
    - KEYS: DNS: limit the length of option strings
    - l2tp: check sockaddr length in pppol2tp_connect()
    - net: validate attribute sizes in neigh_dump_table()
    - llc: delete timers synchronously in llc_sk_free()
    - tcp: don't read out-of-bounds opsize
    - net: af_packet: fix race in PACKET_{R|T}X_RING
    - tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets
    - net: fix deadlock while clearing neighbor proxy table
    - team: avoid adding twice the same option to the event list
    - net/smc: fix shutdown in state SMC_LISTEN
    - team: fix netconsole setup over team
    - packet: fix bitfield update race
    - tipc: add policy for TIPC_NLA_NET_ADDR
    - pppoe: check sockaddr length in pppoe_connect()
    - vlan: Fix reading memory beyond skb->tail in skb_vlan_tagged_multi
    - amd-xgbe: Add pre/post auto-negotiation phy hooks
    - sctp: do not check port in sctp_inet6_cmp_addr
    - amd-xgbe: Improve KR auto-negotiation and training
    - strparser: Do not call mod_delayed_work with a timeout of LONG_MAX
    - amd-xgbe: Only use the SFP supported transceiver signals
    - strparser: Fix incorrect strp->need_bytes value.
    - net: sched: ife: signal not finding metaid
    - tcp: clear tp->packets_out when purging write queue
    - net: sched: ife: handle malformed tlv length
    - net: sched: ife: check on metadata length
    - llc: hold llc_sap before release_sock()
    - llc: fix NULL pointer deref for SOCK_ZAPPED
    - net: ethernet: ti: cpsw: fix tx vlan priority mapping
    - virtio_net: split out ctrl buffer
    - virtio_net: fix adding vids on big-endian
    - KVM: s390: force bp isolation for VSIE
    - s390: correct module section names for expoline code revert
    - microblaze: Setup dependencies for ASM optimized lib functions
    - commoncap: Handle memory allocation failure.
    - scsi: mptsas: Disable WRITE SAME
    - cdrom: information leak in cdrom_ioctl_media_changed()
    - m68k/mac: Don't remap SWIM MMIO region
    - block/swim: Check drive type
    - block/swim: Don't log an error message for an invalid ioctl
    - block/swim: Remove extra put_disk() call from error path
    - block/swim: Rename macros to avoid inconsistent inverted logic
    - block/swim: Select appropriate drive on device open
    - block/swim: Fix array bounds check
    - block/swim: Fix IO error at end of medium
    -...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Kalpana S Shetty (kalshett) wrote :

We have used below bionic "-proposed" path to get install it freshly, still it fails to boot.

Issue is still observed after installing from http://ports.ubuntu.com/ubuntu-ports/dists/bionic-proposed/main/installer-ppc64el/current/images/netboot/ubuntu-installer/ppc64el/

Please advice with the right path to pick the fixed kernel.

Frank Heimes (frank-heimes) wrote :

Did you passed the following boot parameter while installing 18.04 from proposed?
"apt-setup/proposed=true"

https://bugs.launchpad.net/ubuntu/+source/debian-installer/+bug/1773973/comments/10

Manoj Iyer (manjo) on 2018-07-16
Changed in debian-installer (Ubuntu):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Changed in debian-installer (Ubuntu Bionic):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Changed in debian-installer (Ubuntu):
importance: Undecided → Critical
Changed in debian-installer (Ubuntu Bionic):
importance: Undecided → Critical
Steve Langasek (vorlon) wrote :

There has been no build of debian-installer in bionic-proposed against the 4.15.0-22 kernel reported to have this issue. The 4.15.0-24 kernel is now released to bionic-updates. So there should be no changes required on debian-installer for this.

Changed in debian-installer (Ubuntu):
status: New → Invalid
Changed in debian-installer (Ubuntu Bionic):
status: New → Invalid
Jeff Lane (bladernr) wrote :

Bionic is still completely uninstallable (via MAAS) because of this. MAAS is booting (currently) 4.15.0-23 for Bionic deployments and the oopese still occur sending the machine into a reboot loop.

https://paste.ubuntu.com/p/wwPFmMn4wt/

tags: added: blocks-hwcert-server
Jeff Lane (bladernr) wrote :

^^ I also note that the fix is in -24, so I'll re-confirm once that is available via MAAS.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-07-18 20:13 EDT-------
LP1773973 tracks the same issue.
Note from Kalpana -
With 'apt-setup/proposed=true', the boot is successful. This bug can be closed.

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.