[Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX

Bug #1853326 reported by Ike Panhc on 2019-11-20
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Bionic
High
Ike Panhc

Bug Description

[Impact]
4.15.0-71-generic can not boot on ThunderX

Here are console logs
https://pastebin.ubuntu.com/p/xcRk7VRrzF/
https://pastebin.ubuntu.com/p/DkdBqbBqqD/

[Test Case]
Boot kernel with earlycon. See if kernel oops while booting.

[Regression Risk]
TBD

Ike Panhc (ikepanhc) on 2019-11-20
Changed in linux (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu Bionic):
status: New → In Progress
assignee: nobody → Ike Panhc (ikepanhc)
importance: Undecided → High
Ike Panhc (ikepanhc) wrote :

The root cause of regression is within

d0f174e40a6 <email address hidden> 2019-11-12 19:04:49 +0100 arm64: enable generic CPU vulnerabilites support
f94f9d3a3e8b <email address hidden> 2019-11-12 19:04:49 +0100 arm64: add sysfs vulnerability show for meltdown
c288f6b5788d <email address hidden> 2019-11-12 19:04:48 +0100 arm64: Add sysfs vulnerability show for spectre-v1
f2485ae5fd84 <email address hidden> 2019-11-12 19:04:48 +0100 arm64: fix SSBS sanitization
1931a913df7e <email address hidden> 2019-11-12 19:04:48 +0100 KVM: arm64: Set SCTLR_EL2.DSSBS if SSBD is forcefully disabled and !vhe
fd872fd82e12 <email address hidden> 2019-11-12 19:04:48 +0100 arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3
2a3135c3033c <email address hidden> 2019-11-12 19:04:48 +0100 arm64: cpufeature: Detect SSBS and advertise to userspace
78dc3acb34fa <email address hidden> 2019-11-12 19:04:48 +0100 arm64: Get rid of __smccc_workaround_1_hvc_*
5c43fb65359d <email address hidden> 2019-11-12 19:04:48 +0100 arm64: don't zero DIT on signal return
c6c07232325a <email address hidden> 2019-11-12 19:04:48 +0100 arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening
274adba3ccf6 <email address hidden> 2019-11-12 19:04:47 +0100 arm64: capabilities: Add support for checks based on a list of MIDRs
f34e57c35b72 <email address hidden> 2019-11-12 19:04:47 +0100 arm64: Add MIDR encoding for Arm Cortex-A55 and Cortex-A35
8d811d39465c <email address hidden> 2019-11-12 19:04:47 +0100 arm64: Add helpers for checking CPU MIDR against a range
b2eddaf65384 <email address hidden> 2019-11-12 19:04:47 +0100 arm64: capabilities: Clean up midr range helpers
628859e8621c <email address hidden> 2019-11-12 19:04:47 +0100 arm64: capabilities: Change scope of VHE to Boot CPU feature
3bf4ffd98cc4 <email address hidden> 2019-11-12 19:04:47 +0100 arm64: capabilities: Add support for features enabled early

Ike Panhc (ikepanhc) wrote :

Narrow down to

1931a913df7e <email address hidden> 2019-11-12 19:04:48 +0100 KVM: arm64: Set SCTLR_EL2.DSSBS if SSBD is forcefully disabled and !vhe
fd872fd82e12 <email address hidden> 2019-11-12 19:04:48 +0100 arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3
2a3135c3033c <email address hidden> 2019-11-12 19:04:48 +0100 arm64: cpufeature: Detect SSBS and advertise to userspace
78dc3acb34fa <email address hidden> 2019-11-12 19:04:48 +0100 arm64: Get rid of __smccc_workaround_1_hvc_*

Ike Panhc (ikepanhc) wrote :

Bisect end in this patch. I am going to build kernel with this patch reverted and test.

78dc3acb34fa <email address hidden> 2019-11-12 19:04:48 +0100 arm64: Get rid of __smccc_workaround_1_hvc_*

Ike Panhc (ikepanhc) on 2019-11-25
description: updated
Stefan Bader (smb) on 2019-11-26
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
dann frazier (dannf) wrote :

Tracking the reapplication of the reverted patches in bug 1854207.

Ike Panhc (ikepanhc) wrote :

Tried 4.15.0-72.81 kernel on 1 socket and 2 sockets ThunderX machine and all boot ok.

Thanks.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (28.6 KiB)

This bug was fixed in the package linux - 4.15.0-72.81

---------------
linux (4.15.0-72.81) bionic; urgency=medium

  * bionic/linux: 4.15.0-72.81 -proposed tracker (LP: #1854027)

  * [Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX
    (LP: #1853326)
    - Revert "arm64: Use firmware to detect CPUs that are not affected by
      Spectre-v2"
    - Revert "arm64: Get rid of __smccc_workaround_1_hvc_*"

  * [Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX2 and
    Kunpeng920 (LP: #1852723)
    - SAUCE: arm64: capabilities: Move setup_boot_cpu_capabilities() call to
      correct place

linux (4.15.0-71.80) bionic; urgency=medium

  * bionic/linux: 4.15.0-71.80 -proposed tracker (LP: #1852289)

  * Bionic update: upstream stable patchset 2019-10-29 (LP: #1850541)
    - panic: ensure preemption is disabled during panic()
    - f2fs: use EINVAL for superblock with invalid magic
    - [Config] updateconfigs for USB_RIO500
    - USB: rio500: Remove Rio 500 kernel driver
    - USB: yurex: Don't retry on unexpected errors
    - USB: yurex: fix NULL-derefs on disconnect
    - USB: usb-skeleton: fix runtime PM after driver unbind
    - USB: usb-skeleton: fix NULL-deref on disconnect
    - xhci: Fix false warning message about wrong bounce buffer write length
    - xhci: Prevent device initiated U1/U2 link pm if exit latency is too long
    - xhci: Check all endpoints for LPM timeout
    - usb: xhci: wait for CNR controller not ready bit in xhci resume
    - USB: adutux: fix use-after-free on disconnect
    - USB: adutux: fix NULL-derefs on disconnect
    - USB: adutux: fix use-after-free on release
    - USB: iowarrior: fix use-after-free on disconnect
    - USB: iowarrior: fix use-after-free on release
    - USB: iowarrior: fix use-after-free after driver unbind
    - USB: usblp: fix runtime PM after driver unbind
    - USB: chaoskey: fix use-after-free on release
    - USB: ldusb: fix NULL-derefs on driver unbind
    - serial: uartlite: fix exit path null pointer
    - USB: serial: keyspan: fix NULL-derefs on open() and write()
    - USB: serial: ftdi_sio: add device IDs for Sienna and Echelon PL-20
    - USB: serial: option: add Telit FN980 compositions
    - USB: serial: option: add support for Cinterion CLS8 devices
    - USB: serial: fix runtime PM after driver unbind
    - USB: usblcd: fix I/O after disconnect
    - USB: microtek: fix info-leak at probe
    - USB: dummy-hcd: fix power budget for SuperSpeed mode
    - usb: renesas_usbhs: gadget: Do not discard queues in
      usb_ep_set_{halt,wedge}()
    - usb: renesas_usbhs: gadget: Fix usb_ep_set_{halt,wedge}() behavior
    - USB: legousbtower: fix slab info leak at probe
    - USB: legousbtower: fix deadlock on disconnect
    - USB: legousbtower: fix potential NULL-deref on disconnect
    - USB: legousbtower: fix open after failed reset request
    - USB: legousbtower: fix use-after-free on release
    - staging: vt6655: Fix memory leak in vt6655_probe
    - iio: adc: ad799x: fix probe error handling
    - iio: adc: axp288: Override TS pin bias current for some models
    - iio: light: opt3001: fix mutex unlock race
    - efivar/ssdt: Don't iterate over EFI va...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers