Kernel 5.14.X / 5.13.14 fails to boot

Bug #1942684 reported by Bluestang
48
This bug affects 8 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I have been testing kernels 5.14.0 and 5.14.1 since their release on https://kernel.ubuntu.com/~kernel-ppa/mainline/ and my machine fails to boot with either one.

However, I am able to boot just fine with 5.14.0-rc7.

journalctl -b output attached with 5.14.1

Motherboard: MSI X570 Tomahawk
CPU: AMD 5900X
GPU: AsusTek 6800XT
OS: Hirsute 21.04
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu65.1
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 21.04
InstallationDate: Installed on 2021-07-04 (62 days ago)
InstallationMedia: Ubuntu 21.04 "Hirsute Hippo" - Release amd64 (20210420)
Package: linux (not installed)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
Tags: wayland-session hirsute
Uname: Linux 5.14.0-051400rc6-generic x86_64
UnreportableReason: The running kernel is not an Ubuntu kernel
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True

Revision history for this message
Bluestang (bluestang) wrote :
tags: added: hirsute kernel-bug
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1942684

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Bluestang (bluestang) wrote : ProcCpuinfoMinimal.txt

apport information

tags: added: apport-collected wayland-session
description: updated
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Bluestang (bluestang)
description: updated
Revision history for this message
Gannet (ken20001) wrote (last edit ): Re: Kernel 5.14.0/1/5.13.14 fails to boot

Now 5.13.4 also fails to boot while 5.13.13 is OK. Obviously some fail code has been backported from 5.14 to 5.13.14.

summary: - Kernel 5.14.0/1 fails to boot
+ Kernel 5.14.0/1/5.13.4 fails to boot
summary: - Kernel 5.14.0/1/5.13.4 fails to boot
+ Kernel 5.14.0/1/5.13.14 fails to boot
Revision history for this message
Bluestang (bluestang) wrote (last edit ):

Seems like it is one of these commits - https://cgit.freedesktop.org/drm/drm/tag/?h=drm-fixes-2021-08-27

I just looked at changelog for 5.13.14 and those 3 commits were also added.

Revision history for this message
Cristiano Rodrigues (microcris) wrote :
Download full text (8.7 KiB)

It started to happen since rc7

I'm supposing that it is happening because of this:

10/09/21 16:18 CRIS-DELL kernel [ 3.880089] RIP: 0010:nv_drm_format_array_alloc+0xb3/0xb5 [nvidia_drm]
10/09/21 16:18 CRIS-DELL kernel [ 3.880094] Code: 16 5b 4c 89 c0 41 5c 5d c3 f3 48 0f bc c0 89 c1 83 f8 22 76 b2 eb e2 4c 89 c7 e8 c8 e6 ff ff 45 31 c0 5b 41 5c 4c 89 c0 5d c3 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 49 89 fc 53 48
10/09/21 16:18 CRIS-DELL kernel [ 3.880095] RSP: 0018:ffffae2a00f83a58 EFLAGS: 00010212
10/09/21 16:18 CRIS-DELL kernel [ 3.880097] RAX: 0000000000000020 RBX: 00000001ffffefff RCX: 0000000000000020
10/09/21 16:18 CRIS-DELL kernel [ 3.880098] RDX: 0000000032313050 RSI: 0000000000000001 RDI: 0000000000000012
10/09/21 16:18 CRIS-DELL kernel [ 3.880099] RBP: ffffae2a00f83a68 R08: ffff8fbf0770c580 R09: ffff8fbf0770c580
10/09/21 16:18 CRIS-DELL kernel [ 3.880100] R10: ffff8fbf06e90028 R11: ffff8fbf1888e808 R12: ffffae2a00f83a9c
10/09/21 16:18 CRIS-DELL kernel [ 3.880101] R13: ffff8fbf0708a000 R14: ffff8fbf077a6600 R15: 0000000000000000
10/09/21 16:18 CRIS-DELL kernel [ 3.880102] FS: 00007f0c3ba20d00(0000) GS:ffff8fc28bc40000(0000) knlGS:0000000000000000
10/09/21 16:18 CRIS-DELL kernel [ 3.880104] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
10/09/21 16:18 CRIS-DELL kernel [ 3.880105] CR2: 0000563d833a3000 CR3: 0000000100cb6001 CR4: 00000000003706e0
10/09/21 16:18 CRIS-DELL kernel [ 3.880106] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
10/09/21 16:18 CRIS-DELL kernel [ 3.880106] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
10/09/21 16:18 CRIS-DELL kernel [ 3.880107] Call Trace:
10/09/21 16:18 CRIS-DELL kernel [ 3.880109] nv_drm_plane_create+0x7d/0x2f0 [nvidia_drm]
10/09/21 16:18 CRIS-DELL kernel [ 3.880114] nv_drm_enumerate_crtcs_and_planes+0x13f/0x2a0 [nvidia_drm]
10/09/21 16:18 CRIS-DELL kernel [ 3.880117] nv_drm_load+0x253/0x3b4 [nvidia_drm]
10/09/21 16:18 CRIS-DELL kernel [ 3.880121] ? __cond_resched+0x1a/0x50
10/09/21 16:18 CRIS-DELL kernel [ 3.880123] ? nv_drm_master_drop+0x60/0x60 [nvidia_drm]
10/09/21 16:18 CRIS-DELL kernel [ 3.880127] drm_dev_register+0xd6/0x1c0 [drm]
10/09/21 16:18 CRIS-DELL kernel [ 3.880142] nv_drm_probe_devices+0x10b/0x1f0 [nvidia_drm]
10/09/21 16:18 CRIS-DELL kernel [ 3.880145] ? 0xffffffffc0b86000
10/09/21 16:18 CRIS-DELL kernel [ 3.880146] nv_drm_init+0x1e/0x50 [nvidia_drm]
10/09/21 16:18 CRIS-DELL kernel [ 3.880149] nv_linux_drm_init+0xe/0x1000 [nvidia_drm]
10/09/21 16:18 CRIS-DELL kernel [ 3.880152] do_one_initcall+0x46/0x1d0
10/09/21 16:18 CRIS-DELL kernel [ 3.880154] ? kmem_cache_alloc_trace+0x159/0x2c0
10/09/21 16:18 CRIS-DELL kernel [ 3.880157] do_init_module+0x62/0x290
10/09/21 16:18 CRIS-DELL kernel [ 3.880159] load_module+0xaa3/0xb30
10/09/21 16:18 CRIS-DELL kernel [ 3.880162] __do_sys_finit_module+0xbf/0x120
10/09/21 16:18 CRIS-DELL kernel [ 3.880164] __x64_sys_finit_module+0x18/0x20
10/09/21 16:18 CRIS-DELL kernel [ 3.880166] do_syscall_64+0x59/0xc0
10/09/21 16:18 CRIS-DELL kernel [ 3.880168] ? syscall_exit_to_user_mode+0x...

Read more...

Bluestang (bluestang)
summary: - Kernel 5.14.0/1/5.13.14 fails to boot
+ Kernel 5.14.X / 5.13.14 fails to boot
Revision history for this message
Bluestang (bluestang) wrote :
Download full text (5.9 KiB)

OK, so none of the 5.14.0/1/2/3 kernels will boot. As I mentioned before...1 of the 3 pacthes introduced the regression:

Borislav Petkov (1):
      drm/amdgpu: Fix build with missing pm_suspend_target_state module export

Christian König (1):
      drm/amdgpu: use the preferred pin domain after the check

Michel Dänzer (1):
      drm/amdgpu: Cancel delayed work when GFXOFF is disabled

This the commit in the linux repo - 77dd11439b86e3f7990e4c0c9e0b67dca82750ba

Here is the error my from uploaded crash log:

Sep 04 09:07:39 bluestang-pc kernel: RIP: 0010:amdgpu_discovery_reg_base_init+0x225/0x260 [amdgpu]
Sep 04 09:07:39 bluestang-pc kernel: Code: 0f 85 d4 fe ff ff 48 83 45 c0 01 48 8b 45 c0 39 45 c8 0f 8f 55 fe ff ff 8b 45 b4 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 48 c7 c7 e4 5a 61 c1 e8 9d 79 10 ff eb de 41 89 d0 48 c7 c7
Sep 04 09:07:39 bluestang-pc kernel: RSP: 0018:ffffb883c1907928 EFLAGS: 00010202
Sep 04 09:07:39 bluestang-pc kernel: RAX: 0000000000000008 RBX: ffff99558b89f128 RCX: 0000000000000006
Sep 04 09:07:39 bluestang-pc kernel: RDX: ffffffffc1615b69 RSI: ffffffffc15c0428 RDI: 0000000000000000
Sep 04 09:07:39 bluestang-pc kernel: RBP: ffffb883c1907978 R08: 0000000000000008 R09: 000000000000000b
Sep 04 09:07:39 bluestang-pc kernel: R10: ffff99558b89f120 R11: 0000000000000000 R12: ffff995587c00000
Sep 04 09:07:39 bluestang-pc kernel: R13: 0000000000000019 R14: 0000000000000019 R15: ffff99558b89f120
Sep 04 09:07:39 bluestang-pc kernel: FS: 00007f3d5b7138c0(0000) GS:ffff995c7ea40000(0000) knlGS:0000000000000000
Sep 04 09:07:39 bluestang-pc kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 04 09:07:39 bluestang-pc kernel: CR2: 00007fc505b6c420 CR3: 0000000106d90000 CR4: 0000000000750ee0
Sep 04 09:07:39 bluestang-pc kernel: PKRU: 55555554
Sep 04 09:07:39 bluestang-pc kernel: Call Trace:
Sep 04 09:07:39 bluestang-pc kernel: nv_set_ip_blocks+0x8e/0xab0 [amdgpu]
Sep 04 09:07:39 bluestang-pc kernel: amdgpu_device_ip_early_init+0x2b1/0x47f [amdgpu]
Sep 04 09:07:39 bluestang-pc kernel: ? amdgpu_device_get_job_timeout_settings+0x90/0x1cc [amdgpu]
Sep 04 09:07:39 bluestang-pc kernel: amdgpu_device_init.cold+0xc9/0x6d1 [amdgpu]
Sep 04 09:07:39 bluestang-pc kernel: amdgpu_driver_load_kms+0x6d/0x310 [amdgpu]
Sep 04 09:07:39 bluestang-pc kernel: amdgpu_pci_probe+0x11b/0x1a0 [amdgpu]
Sep 04 09:07:39 bluestang-pc kernel: local_pci_probe+0x48/0x80
Sep 04 09:07:39 bluestang-pc kernel: pci_device_probe+0x105/0x1d0
Sep 04 09:07:39 bluestang-pc kernel: really_probe+0x1fe/0x400
Sep 04 09:07:39 bluestang-pc kernel: __driver_probe_device+0x109/0x180
Sep 04 09:07:39 bluestang-pc kernel: driver_probe_device+0x23/0x90
Sep 04 09:07:39 bluestang-pc kernel: __driver_attach+0xac/0x1b0
Sep 04 09:07:39 bluestang-pc kernel: ? __device_attach_driver+0xe0/0xe0
Sep 04 09:07:39 bluestang-pc kernel: bus_for_each_dev+0x7e/0xc0
Sep 04 09:07:39 bluestang-pc kernel: driver_attach+0x1e/0x20
Sep 04 09:07:39 bluestang-pc kernel: bus_add_driver+0x135/0x1f0
Sep 04 09:07:39 bluestang-pc kernel: driver_register+0x95/0xf0
Sep 04 09:07:39 bluestang-pc kernel: __pci_register_driver+0x68/0x70
Sep 04 09:07:39 bluestang-pc kernel: amdgpu_ini...

Read more...

Revision history for this message
Bluestang (bluestang) wrote :

Patch file of commit that caused the regression

tags: added: patch
Revision history for this message
Ernst Persson (ernstp) wrote :

From my investigation the issue was not caused by a kernel patch but by the Mainline ppa enabling CONFIG_UBSAN_TRAP

Revision history for this message
Cristiano Rodrigues (microcris) wrote (last edit ):

@ernstp, what can we do to bypass this? Or, what info do we need to provide in order to catch what is making the kernel to stop?

Revision history for this message
Rocko (rockorequin) wrote :

I also can't boot the mainline 5.14-4 or 5.14-5 kernels on a Lenovo S7 (AMD 5800H and NVIDIA GPU) - it goes to a blank screen when trying to boot graphics and freezes - I can't open a tty and have to hard reset the system.

It looks like I have the laptop in the same mode as the other posters, ie hybrid/dynamic graphics mode using the amdgpu driver for the laptop screen.

I can't see any relevant errors in the output "journalctl -b -1" for the failed boot - I think the hang occurs before it has a chance to log them.

However, when booting the stock Ubuntu 21.10 5.13.0-16 kernel I do see some nvidia_drm RIP errors in the log in nv_drm_master_set(). Presumably these errors in combination with CONFIG_UBSAN_TRAP are causing the hang? The documentation says that in order to save around 5% of the kernel size, CONFIG_UBSAN_TRAP "reduces the kernel size overhead but turns all warnings (including potentially harmless conditions) into full exceptions that abort the running kernel code (regardless of context, locks held, etc), which may destabilize the system." It seems like a pretty aggressive config option to set!

Revision history for this message
Rocko (rockorequin) wrote :

Just to confirm, Ubuntu's mainline 5.14.5 kernel does boot successfully on my machine with CONFIG_UBSAN_TRAP not set.

Revision history for this message
Rashad Tatum (rmtatum) wrote :

It looks like the following patch from Aug. 20, 2021 enabled CONFIG_UBSAN_TRAP:
https://patchwork.ozlabs.org<email address hidden>/

Revision history for this message
Rashad Tatum (rmtatum) wrote :

CONFIG_UBSAN_TRAP was added from this issue:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1914685

Revision history for this message
Bluestang (bluestang) wrote :

The patch "drm/amd/amdgpu: Increase HWIP_MAX_INSTANCE to 10" fixes the KOOPS. I am currently on 5.15-rc2 and have not experienced any issues. I reckon that 5.14.7 will also be fine since the changelog shows that the same patch was backported.

Looks like this bug has always been there but was brought to light once CONFIG_UBSAN_TRAP was enabled.

Revision history for this message
Cristiano Rodrigues (microcris) wrote :

Linux 5.15-rc2 is booting ok with the latest nvidia driver (470.74)
It seems the RIP: 0010:nv_drm_format_array_alloc+0xb3/0xb5 [nvidia_drm] is not CONFIG_UBSAN_TRAP fault.

Revision history for this message
Rashad Tatum (rmtatum) wrote :

"Fixed a bug that caused nvidia-drm.ko to crash when loading with DRM-KMS enabled (modeset=1) on Linux v5.14"
Could this be the fix then?
https://www.nvidia.com/Download/driverResults.aspx/180475/en-us

Revision history for this message
Rashad Tatum (rmtatum) wrote :

I can confirm that 5.14.7 boots, but my usb-c displayport out using my Kensington dock doesn't work. I'll try upgrading the NVIDIA drivers.

Revision history for this message
Rashad Tatum (rmtatum) wrote :

NVIDIA driver version 470.74 fixes my usb-c displayport issues.

But I still think leaving the CONFIG_UBSAN_TRAP flag in the build is concerning (at least, based on the documentation for the flag). It's probably okay for testing

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : apport information

ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu69
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: vladimir 1867 F.... pipewire-media-
 /dev/snd/controlC2: vladimir 1867 F.... pipewire-media-
 /dev/snd/controlC0: vladimir 1867 F.... pipewire-media-
 /dev/snd/seq: vladimir 1866 F.... pipewire
CasperMD5CheckResult: pass
CurrentDesktop: Unity:Unity7:ubuntu
DistroRelease: Ubuntu 21.10
InstallationDate: Installed on 2021-08-25 (33 days ago)
InstallationMedia: Ubuntu 21.10 "Impish Indri" - Alpha amd64 (20210824)
IwConfig:
 lo no wireless extensions.

 enp4s0 no wireless extensions.
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
Package: linux (not installed)
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.13.0-16-generic root=UUID=c764e890-a0d7-4902-9e65-26e456346deb ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 5.13.0-16.16-generic 5.13.13
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-5.13.0-16-generic N/A
 linux-backports-modules-5.13.0-16-generic N/A
 linux-firmware 1.200
RfKill:
 0: hci0: Bluetooth
  Soft blocked: no
  Hard blocked: no
Tags: impish
Uname: Linux 5.13.0-16-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 08/04/2021
dmi.bios.release: 5.17
dmi.bios.vendor: American Megatrends International, LLC.
dmi.bios.version: P2.10
dmi.board.name: B550 Steel Legend
dmi.board.vendor: ASRock
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInternational,LLC.:bvrP2.10:bd08/04/2021:br5.17:svnToBeFilledByO.E.M.:pnToBeFilledByO.E.M.:pvrToBeFilledByO.E.M.:skuToBeFilledByO.E.M.:rvnASRock:rnB550SteelLegend:rvr:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: To Be Filled By O.E.M.
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.version: To Be Filled By O.E.M.
dmi.sys.vendor: To Be Filled By O.E.M.

tags: added: impish
Revision history for this message
vladimir2k9 (vladimir2k9) wrote : AlsaInfo.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : CRDA.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : CurrentDmesg.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : Lspci.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : Lspci-vt.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : Lsusb.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : Lsusb-t.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : Lsusb-v.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : PaInfo.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : ProcEnviron.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : ProcInterrupts.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : ProcModules.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : UdevDb.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : WifiSyslog.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote : acpidump.txt

apport information

Revision history for this message
vladimir2k9 (vladimir2k9) wrote :

Install linux 5.14.8 and my system not boot.

Revision history for this message
Gannet (ken20001) wrote :
Revision history for this message
DCMarkie (mhoffmeyer) wrote :

Adding

i915.fastboot=0

to the kernel params allows booting on these kernels.

(Tested working on a XPS 9575)

Revision history for this message
vladimir2k9 (vladimir2k9) wrote :

How to disable "CONFIG_UBSAN_TRAP" and check boot? Or need recompiled kernel?

Revision history for this message
DanglingPointer (ferncasado) wrote (last edit ):

Tried building 5.14.14 from vanilla upstream https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.14.14.tar.xz
using the config from ubuntu mainline config-5.14.14-051414-generic
but removing the debian "pem" for a custom kernel build; and...
I used LLVM-13/Clang-13 with specific march for both, O3, and the LTO thin parameters for config.
It fails to boot after grub for ivybridge and haswell machines.

I will try the following and report here:

1) I will try disabling "CONFIG_UBSAN_TRAP" to see if that works with LLVM-13/clang-13 with similar parameters above.
2) I will try with GCC-11.1 with 'enabled' "CONFIG_UBSAN_TRAP" with similar parameters above without LTO.

Revision history for this message
DanglingPointer (ferncasado) wrote (last edit ):

Done

Point 1) above WORKS on zen3, haswell, and ivybridge.
That means disabled 'CONFIG_UBSAN_TRAP' in the kernel config prior to building.
Just nano or vim into the ".config" and put a '#' in front of the line to disable it. before running "$ make olddefconfig"

I built the kernel using LLVM-13 and LTO thin with O3 optimisation with -march=<cpuType>

Point 2) above FAILED on ivybridge. I didn't bother trying to build it for other march types.
It was built using the Ubuntu mainline config for 5.14.14 with the debian pems removed for a custom kernel build. "CONFIG_UBSAN_TRAP" is enabled by default in the Ubuntu kernel config.

I built the kernel using GCC-11.1 with O3 optimisation with -march=<cpuType>

I highly recommend that Ubuntu Mainline Kernel team DISABLE "CONFIG_UBSAN_TRAP" from their mainline kernel config as it can FAIL production kernels as that option forces what otherwise would have been a harmless warning into an error and possibly an abort or kernel panic of some sort.

Revision history for this message
DanglingPointer (ferncasado) wrote :

On another test, using vanilla Ubuntu mainline 5.14.14 on a guest VBOX VM runs ok for about half a day then crashes to black screen. This has happened 4x!

Replacing the vanilla Ubuntu mainline 5.14.14 with a custom build of 5.14.14 using point 1 above (see comment 43) has so far worked for 3 days and is still working.

Having that configuration is dangerous in production settings! I can't stress it enough! If you use Ubuntu Mainline kernels in production settings, then...

****DO NOT USE UBUNTU MAINLINE 5.14.X KERNELS.****

Use Ubuntu mainline 5.13.x or rebuild the 5.14.x kernels from kernel.org and disable "CONFIG_UBSAN_TRAP" in the kernel config.

Revision history for this message
Gannet (ken20001) wrote :

Seems Ubuntu Mainline Kernel team doesn't hear us as 5.15.1 still fails to boot but they're don't care about it.

Revision history for this message
DanglingPointer (ferncasado) wrote :

Fails for me too 5.15.1 on Penryn, Ivybridge, Haswell and guest VMs on Zen3.

I had to rebuild the kernel from kernel.org using the config from Ubuntu Mainline but disabling the debian pem certs and "CONFIG_UBSAN_TRAP".

After that it worked on all the above architectures.

Revision history for this message
DanglingPointer (ferncasado) wrote :

Looks like they may have removed "CONFIG_UBSAN_TRAP" from the kernel config of 5.15.3!

Only this is in the config...
```
# CONFIG_UBSAN is not set
```

I'm building it now to see if it is ok and works.

Revision history for this message
DanglingPointer (ferncasado) wrote :

Kernel works ok.

So as it stands, new 5.15.3 has no "CONFIG_UBSAN_TRAP" in config.

So for new kernels you can probably close this bug off as long as they don't reintroduce it.

Revision history for this message
vladimir2k9 (vladimir2k9) wrote :

5.15.6 still fails to boot on my desktop.

Revision history for this message
DanglingPointer (ferncasado) wrote (last edit ):

Does it have "CONFIG_UBSAN_TRAP" set in the config?

Revision history for this message
vladimir2k9 (vladimir2k9) wrote :
Revision history for this message
DanglingPointer (ferncasado) wrote :

1) Do an $ ls -lah /boot

2) look for config-5.15.6<something> file and nano into it. That's the config for the kernel that Ubuntu used for building it.

3) ctrl+w and type or paste "CONFIG_UBSAN_TRAP" and hit enter. (paste is ctrl+shift+v)

If it doesn't find it, then it is odd that it doesn't work for you. The problem could be elsewhere. That said I don't have that kernel version running.

Revision history for this message
vladimir2k9 (vladimir2k9) wrote :

Does not find it.

Only find by word "UBSAN":
CONFIG_ARCH_HAS_UBSAN_SANITIZE_ALL=y
# CONFIG_UBSAN is not set

My system hangs up on early start. Not response any key. Alt+SysRq+b not work too.
But if I choose Recovery mode, an then continue booting, it boots.

Revision history for this message
DanglingPointer (ferncasado) wrote :

Is it a new or old system?
What's the last working kernel version for you?

Have you considered going back to the official kernel version for your Ubuntu (if it works?)?

Another option you may have would be to try disabling that "...SANITZE_ALL" config and rebuild the kernel from kernel.org using the updated ubuntu config. See if it works.

Revision history for this message
vladimir2k9 (vladimir2k9) wrote :

New system. Ubuntu 21.10
I am on official kernel, but wanted use latest.

Revision history for this message
Gannet (ken20001) wrote :

Still the same with v5.16-rc8.

Revision history for this message
Claus Lensbøl (cmol) wrote :

Had a similar issue on a 20.04 install today after updating to 5.14.

Woke it up from suspend on 5.10.0-1057-oem. Rebooted once (some screen issues that turned out to be the screen), installed updates via update-manager, rebooted using the new 5.14.0-1024-oem kernel and it just hung. Black screen, blinking cursor.

I noticed that the lenovo splash screen did not come up after selecting the 5.14 kernel in grub. It does with 5.10.

Going back to 5.10 works as before. I have a kern.log saved from the before and after the update, but do I need to make sure it is sanitized for secure boot keys and such?

Also, it should be noted that it is a dual GPU system with an intel and nvidia.

Revision history for this message
Andrew Purtell (apurtell) wrote :

Running 5.13.0-xx on Parallels Desktop on M1 Macbook. 5.13.0-12-generic is the last kernel that will boot. -13 and -14 have been withdrawn, apparently. -15 and -16 just hang as soon as control is given to the kernel back from the EFI loader. Some fail code was backported post -12 from more recent kernels it seems.

Revision history for this message
vladimir2k9 (vladimir2k9) wrote (last edit ):

I find why my system not boot.
After some googling and test, i used kernel parameters "amdgpu.aspm=0" and system booted. Seems it now set to auto for default.

aspm
ASPM support (1 = enable, 0 = disable, -1 = auto)

Also booted old kernels from 5.14 . Now i am on 5.17
$ uname -a
Linux vladimir-desktop 5.17.0-051700-generic #202203202130 SMP PREEMPT Sun Mar 20 21:33:41 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Patches

Remote bug watches

Bug watches keep track of this bug in other bug trackers.