Kernel 5.14.X / 5.13.14 fails to boot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
I have been testing kernels 5.14.0 and 5.14.1 since their release on https:/
However, I am able to boot just fine with 5.14.0-rc7.
journalctl -b output attached with 5.14.1
Motherboard: MSI X570 Tomahawk
CPU: AMD 5900X
GPU: AsusTek 6800XT
OS: Hirsute 21.04
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu65.1
Architecture: amd64
CasperMD5CheckR
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 21.04
InstallationDate: Installed on 2021-07-04 (62 days ago)
InstallationMedia: Ubuntu 21.04 "Hirsute Hippo" - Release amd64 (20210420)
Package: linux (not installed)
ProcEnviron:
TERM=xterm-
PATH=(custom, no user)
XDG_RUNTIME_
LANG=en_US.UTF-8
SHELL=/bin/bash
Tags: wayland-session hirsute
Uname: Linux 5.14.0-
UnreportableReason: The running kernel is not an Ubuntu kernel
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True
Bluestang (bluestang) wrote : | #1 |
tags: | added: hirsute kernel-bug |
description: | updated |
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs. | #2 |
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
Bluestang (bluestang) wrote : ProcCpuinfoMinimal.txt | #3 |
tags: | added: apport-collected wayland-session |
description: | updated |
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
description: | updated |
Gannet (ken20001) wrote (last edit ): Re: Kernel 5.14.0/1/5.13.14 fails to boot | #5 |
Now 5.13.4 also fails to boot while 5.13.13 is OK. Obviously some fail code has been backported from 5.14 to 5.13.14.
summary: |
- Kernel 5.14.0/1 fails to boot + Kernel 5.14.0/1/5.13.4 fails to boot |
summary: |
- Kernel 5.14.0/1/5.13.4 fails to boot + Kernel 5.14.0/1/5.13.14 fails to boot |
Bluestang (bluestang) wrote (last edit ): | #6 |
Seems like it is one of these commits - https:/
I just looked at changelog for 5.13.14 and those 3 commits were also added.
Cristiano Rodrigues (microcris) wrote : | #7 |
- Failed Boot System Log Edit (235.4 KiB, text/plain)
It started to happen since rc7
I'm supposing that it is happening because of this:
10/09/21 16:18 CRIS-DELL kernel [ 3.880089] RIP: 0010:nv_
10/09/21 16:18 CRIS-DELL kernel [ 3.880094] Code: 16 5b 4c 89 c0 41 5c 5d c3 f3 48 0f bc c0 89 c1 83 f8 22 76 b2 eb e2 4c 89 c7 e8 c8 e6 ff ff 45 31 c0 5b 41 5c 4c 89 c0 5d c3 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 49 89 fc 53 48
10/09/21 16:18 CRIS-DELL kernel [ 3.880095] RSP: 0018:ffffae2a00
10/09/21 16:18 CRIS-DELL kernel [ 3.880097] RAX: 0000000000000020 RBX: 00000001ffffefff RCX: 0000000000000020
10/09/21 16:18 CRIS-DELL kernel [ 3.880098] RDX: 0000000032313050 RSI: 0000000000000001 RDI: 0000000000000012
10/09/21 16:18 CRIS-DELL kernel [ 3.880099] RBP: ffffae2a00f83a68 R08: ffff8fbf0770c580 R09: ffff8fbf0770c580
10/09/21 16:18 CRIS-DELL kernel [ 3.880100] R10: ffff8fbf06e90028 R11: ffff8fbf1888e808 R12: ffffae2a00f83a9c
10/09/21 16:18 CRIS-DELL kernel [ 3.880101] R13: ffff8fbf0708a000 R14: ffff8fbf077a6600 R15: 0000000000000000
10/09/21 16:18 CRIS-DELL kernel [ 3.880102] FS: 00007f0c3ba20d0
10/09/21 16:18 CRIS-DELL kernel [ 3.880104] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
10/09/21 16:18 CRIS-DELL kernel [ 3.880105] CR2: 0000563d833a3000 CR3: 0000000100cb6001 CR4: 00000000003706e0
10/09/21 16:18 CRIS-DELL kernel [ 3.880106] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
10/09/21 16:18 CRIS-DELL kernel [ 3.880106] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
10/09/21 16:18 CRIS-DELL kernel [ 3.880107] Call Trace:
10/09/21 16:18 CRIS-DELL kernel [ 3.880109] nv_drm_
10/09/21 16:18 CRIS-DELL kernel [ 3.880114] nv_drm_
10/09/21 16:18 CRIS-DELL kernel [ 3.880117] nv_drm_
10/09/21 16:18 CRIS-DELL kernel [ 3.880121] ? __cond_
10/09/21 16:18 CRIS-DELL kernel [ 3.880123] ? nv_drm_
10/09/21 16:18 CRIS-DELL kernel [ 3.880127] drm_dev_
10/09/21 16:18 CRIS-DELL kernel [ 3.880142] nv_drm_
10/09/21 16:18 CRIS-DELL kernel [ 3.880145] ? 0xffffffffc0b86000
10/09/21 16:18 CRIS-DELL kernel [ 3.880146] nv_drm_
10/09/21 16:18 CRIS-DELL kernel [ 3.880149] nv_linux_
10/09/21 16:18 CRIS-DELL kernel [ 3.880152] do_one_
10/09/21 16:18 CRIS-DELL kernel [ 3.880154] ? kmem_cache_
10/09/21 16:18 CRIS-DELL kernel [ 3.880157] do_init_
10/09/21 16:18 CRIS-DELL kernel [ 3.880159] load_module+
10/09/21 16:18 CRIS-DELL kernel [ 3.880162] __do_sys_
10/09/21 16:18 CRIS-DELL kernel [ 3.880164] __x64_sys_
10/09/21 16:18 CRIS-DELL kernel [ 3.880166] do_syscall_
10/09/21 16:18 CRIS-DELL kernel [ 3.880168] ? syscall_
summary: |
- Kernel 5.14.0/1/5.13.14 fails to boot + Kernel 5.14.X / 5.13.14 fails to boot |
Bluestang (bluestang) wrote : | #8 |
OK, so none of the 5.14.0/1/2/3 kernels will boot. As I mentioned before...1 of the 3 pacthes introduced the regression:
Borislav Petkov (1):
drm/amdgpu: Fix build with missing pm_suspend_
Christian König (1):
drm/amdgpu: use the preferred pin domain after the check
Michel Dänzer (1):
drm/amdgpu: Cancel delayed work when GFXOFF is disabled
This the commit in the linux repo - 77dd11439b86e3f
Here is the error my from uploaded crash log:
Sep 04 09:07:39 bluestang-pc kernel: RIP: 0010:amdgpu_
Sep 04 09:07:39 bluestang-pc kernel: Code: 0f 85 d4 fe ff ff 48 83 45 c0 01 48 8b 45 c0 39 45 c8 0f 8f 55 fe ff ff 8b 45 b4 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b 48 c7 c7 e4 5a 61 c1 e8 9d 79 10 ff eb de 41 89 d0 48 c7 c7
Sep 04 09:07:39 bluestang-pc kernel: RSP: 0018:ffffb883c1
Sep 04 09:07:39 bluestang-pc kernel: RAX: 0000000000000008 RBX: ffff99558b89f128 RCX: 0000000000000006
Sep 04 09:07:39 bluestang-pc kernel: RDX: ffffffffc1615b69 RSI: ffffffffc15c0428 RDI: 0000000000000000
Sep 04 09:07:39 bluestang-pc kernel: RBP: ffffb883c1907978 R08: 0000000000000008 R09: 000000000000000b
Sep 04 09:07:39 bluestang-pc kernel: R10: ffff99558b89f120 R11: 0000000000000000 R12: ffff995587c00000
Sep 04 09:07:39 bluestang-pc kernel: R13: 0000000000000019 R14: 0000000000000019 R15: ffff99558b89f120
Sep 04 09:07:39 bluestang-pc kernel: FS: 00007f3d5b7138c
Sep 04 09:07:39 bluestang-pc kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 04 09:07:39 bluestang-pc kernel: CR2: 00007fc505b6c420 CR3: 0000000106d90000 CR4: 0000000000750ee0
Sep 04 09:07:39 bluestang-pc kernel: PKRU: 55555554
Sep 04 09:07:39 bluestang-pc kernel: Call Trace:
Sep 04 09:07:39 bluestang-pc kernel: nv_set_
Sep 04 09:07:39 bluestang-pc kernel: amdgpu_
Sep 04 09:07:39 bluestang-pc kernel: ? amdgpu_
Sep 04 09:07:39 bluestang-pc kernel: amdgpu_
Sep 04 09:07:39 bluestang-pc kernel: amdgpu_
Sep 04 09:07:39 bluestang-pc kernel: amdgpu_
Sep 04 09:07:39 bluestang-pc kernel: local_pci_
Sep 04 09:07:39 bluestang-pc kernel: pci_device_
Sep 04 09:07:39 bluestang-pc kernel: really_
Sep 04 09:07:39 bluestang-pc kernel: __driver_
Sep 04 09:07:39 bluestang-pc kernel: driver_
Sep 04 09:07:39 bluestang-pc kernel: __driver_
Sep 04 09:07:39 bluestang-pc kernel: ? __device_
Sep 04 09:07:39 bluestang-pc kernel: bus_for_
Sep 04 09:07:39 bluestang-pc kernel: driver_
Sep 04 09:07:39 bluestang-pc kernel: bus_add_
Sep 04 09:07:39 bluestang-pc kernel: driver_
Sep 04 09:07:39 bluestang-pc kernel: __pci_register_
Sep 04 09:07:39 bluestang-pc kernel: amdgpu_ini...
Bluestang (bluestang) wrote : | #9 |
tags: | added: patch |
Ernst Persson (ernstp) wrote : | #10 |
From my investigation the issue was not caused by a kernel patch but by the Mainline ppa enabling CONFIG_UBSAN_TRAP
Cristiano Rodrigues (microcris) wrote (last edit ): | #11 |
@ernstp, what can we do to bypass this? Or, what info do we need to provide in order to catch what is making the kernel to stop?
Rocko (rockorequin) wrote : | #12 |
I also can't boot the mainline 5.14-4 or 5.14-5 kernels on a Lenovo S7 (AMD 5800H and NVIDIA GPU) - it goes to a blank screen when trying to boot graphics and freezes - I can't open a tty and have to hard reset the system.
It looks like I have the laptop in the same mode as the other posters, ie hybrid/dynamic graphics mode using the amdgpu driver for the laptop screen.
I can't see any relevant errors in the output "journalctl -b -1" for the failed boot - I think the hang occurs before it has a chance to log them.
However, when booting the stock Ubuntu 21.10 5.13.0-16 kernel I do see some nvidia_drm RIP errors in the log in nv_drm_
Rocko (rockorequin) wrote : | #13 |
Just to confirm, Ubuntu's mainline 5.14.5 kernel does boot successfully on my machine with CONFIG_UBSAN_TRAP not set.
Rashad Tatum (rmtatum) wrote : | #14 |
It looks like the following patch from Aug. 20, 2021 enabled CONFIG_UBSAN_TRAP:
https:/
Rashad Tatum (rmtatum) wrote : | #15 |
CONFIG_UBSAN_TRAP was added from this issue:
https:/
Bluestang (bluestang) wrote : | #16 |
The patch "drm/amd/amdgpu: Increase HWIP_MAX_INSTANCE to 10" fixes the KOOPS. I am currently on 5.15-rc2 and have not experienced any issues. I reckon that 5.14.7 will also be fine since the changelog shows that the same patch was backported.
Looks like this bug has always been there but was brought to light once CONFIG_UBSAN_TRAP was enabled.
Cristiano Rodrigues (microcris) wrote : | #17 |
Linux 5.15-rc2 is booting ok with the latest nvidia driver (470.74)
It seems the RIP: 0010:nv_
Rashad Tatum (rmtatum) wrote : | #18 |
"Fixed a bug that caused nvidia-drm.ko to crash when loading with DRM-KMS enabled (modeset=1) on Linux v5.14"
Could this be the fix then?
https:/
Rashad Tatum (rmtatum) wrote : | #19 |
I can confirm that 5.14.7 boots, but my usb-c displayport out using my Kensington dock doesn't work. I'll try upgrading the NVIDIA drivers.
Rashad Tatum (rmtatum) wrote : | #20 |
NVIDIA driver version 470.74 fixes my usb-c displayport issues.
But I still think leaving the CONFIG_UBSAN_TRAP flag in the build is concerning (at least, based on the documentation for the flag). It's probably okay for testing
vladimir2k9 (vladimir2k9) wrote : apport information | #21 |
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu69
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/
/dev/snd/
/dev/snd/
/dev/snd/seq: vladimir 1866 F.... pipewire
CasperMD5CheckR
CurrentDesktop: Unity:Unity7:ubuntu
DistroRelease: Ubuntu 21.10
InstallationDate: Installed on 2021-08-25 (33 days ago)
InstallationMedia: Ubuntu 21.10 "Impish Indri" - Alpha amd64 (20210824)
IwConfig:
lo no wireless extensions.
enp4s0 no wireless extensions.
MachineType: To Be Filled By O.E.M. To Be Filled By O.E.M.
Package: linux (not installed)
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageV
linux-
linux-
linux-firmware 1.200
RfKill:
0: hci0: Bluetooth
Soft blocked: no
Hard blocked: no
Tags: impish
Uname: Linux 5.13.0-16-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 08/04/2021
dmi.bios.release: 5.17
dmi.bios.vendor: American Megatrends International, LLC.
dmi.bios.version: P2.10
dmi.board.name: B550 Steel Legend
dmi.board.vendor: ASRock
dmi.chassis.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.
dmi.modalias: dmi:bvnAmerican
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: To Be Filled By O.E.M.
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.
dmi.sys.vendor: To Be Filled By O.E.M.
tags: | added: impish |
vladimir2k9 (vladimir2k9) wrote : AlsaInfo.txt | #22 |
vladimir2k9 (vladimir2k9) wrote : CRDA.txt | #23 |
vladimir2k9 (vladimir2k9) wrote : CurrentDmesg.txt | #24 |
vladimir2k9 (vladimir2k9) wrote : Lspci.txt | #25 |
vladimir2k9 (vladimir2k9) wrote : Lspci-vt.txt | #26 |
vladimir2k9 (vladimir2k9) wrote : Lsusb.txt | #27 |
vladimir2k9 (vladimir2k9) wrote : Lsusb-t.txt | #28 |
vladimir2k9 (vladimir2k9) wrote : Lsusb-v.txt | #29 |
vladimir2k9 (vladimir2k9) wrote : PaInfo.txt | #30 |
vladimir2k9 (vladimir2k9) wrote : ProcCpuinfo.txt | #31 |
vladimir2k9 (vladimir2k9) wrote : ProcCpuinfoMinimal.txt | #32 |
vladimir2k9 (vladimir2k9) wrote : ProcEnviron.txt | #33 |
vladimir2k9 (vladimir2k9) wrote : ProcInterrupts.txt | #34 |
vladimir2k9 (vladimir2k9) wrote : ProcModules.txt | #35 |
vladimir2k9 (vladimir2k9) wrote : UdevDb.txt | #36 |
vladimir2k9 (vladimir2k9) wrote : WifiSyslog.txt | #37 |
vladimir2k9 (vladimir2k9) wrote : acpidump.txt | #38 |
vladimir2k9 (vladimir2k9) wrote : | #39 |
Install linux 5.14.8 and my system not boot.
Gannet (ken20001) wrote : | #40 |
DCMarkie (mhoffmeyer) wrote : | #41 |
Adding
i915.fastboot=0
to the kernel params allows booting on these kernels.
(Tested working on a XPS 9575)
vladimir2k9 (vladimir2k9) wrote : | #42 |
How to disable "CONFIG_UBSAN_TRAP" and check boot? Or need recompiled kernel?
DanglingPointer (ferncasado) wrote (last edit ): | #43 |
Tried building 5.14.14 from vanilla upstream https:/
using the config from ubuntu mainline config-
but removing the debian "pem" for a custom kernel build; and...
I used LLVM-13/Clang-13 with specific march for both, O3, and the LTO thin parameters for config.
It fails to boot after grub for ivybridge and haswell machines.
I will try the following and report here:
1) I will try disabling "CONFIG_UBSAN_TRAP" to see if that works with LLVM-13/clang-13 with similar parameters above.
2) I will try with GCC-11.1 with 'enabled' "CONFIG_UBSAN_TRAP" with similar parameters above without LTO.
DanglingPointer (ferncasado) wrote (last edit ): | #44 |
Done
Point 1) above WORKS on zen3, haswell, and ivybridge.
That means disabled 'CONFIG_UBSAN_TRAP' in the kernel config prior to building.
Just nano or vim into the ".config" and put a '#' in front of the line to disable it. before running "$ make olddefconfig"
I built the kernel using LLVM-13 and LTO thin with O3 optimisation with -march=<cpuType>
Point 2) above FAILED on ivybridge. I didn't bother trying to build it for other march types.
It was built using the Ubuntu mainline config for 5.14.14 with the debian pems removed for a custom kernel build. "CONFIG_UBSAN_TRAP" is enabled by default in the Ubuntu kernel config.
I built the kernel using GCC-11.1 with O3 optimisation with -march=<cpuType>
I highly recommend that Ubuntu Mainline Kernel team DISABLE "CONFIG_UBSAN_TRAP" from their mainline kernel config as it can FAIL production kernels as that option forces what otherwise would have been a harmless warning into an error and possibly an abort or kernel panic of some sort.
DanglingPointer (ferncasado) wrote : | #45 |
On another test, using vanilla Ubuntu mainline 5.14.14 on a guest VBOX VM runs ok for about half a day then crashes to black screen. This has happened 4x!
Replacing the vanilla Ubuntu mainline 5.14.14 with a custom build of 5.14.14 using point 1 above (see comment 43) has so far worked for 3 days and is still working.
Having that configuration is dangerous in production settings! I can't stress it enough! If you use Ubuntu Mainline kernels in production settings, then...
****DO NOT USE UBUNTU MAINLINE 5.14.X KERNELS.****
Use Ubuntu mainline 5.13.x or rebuild the 5.14.x kernels from kernel.org and disable "CONFIG_UBSAN_TRAP" in the kernel config.
Gannet (ken20001) wrote : | #46 |
Seems Ubuntu Mainline Kernel team doesn't hear us as 5.15.1 still fails to boot but they're don't care about it.
DanglingPointer (ferncasado) wrote : | #47 |
Fails for me too 5.15.1 on Penryn, Ivybridge, Haswell and guest VMs on Zen3.
I had to rebuild the kernel from kernel.org using the config from Ubuntu Mainline but disabling the debian pem certs and "CONFIG_
After that it worked on all the above architectures.
DanglingPointer (ferncasado) wrote : | #48 |
Looks like they may have removed "CONFIG_UBSAN_TRAP" from the kernel config of 5.15.3!
Only this is in the config...
```
# CONFIG_UBSAN is not set
```
I'm building it now to see if it is ok and works.
DanglingPointer (ferncasado) wrote : | #49 |
Kernel works ok.
So as it stands, new 5.15.3 has no "CONFIG_UBSAN_TRAP" in config.
So for new kernels you can probably close this bug off as long as they don't reintroduce it.
vladimir2k9 (vladimir2k9) wrote : | #50 |
5.15.6 still fails to boot on my desktop.
DanglingPointer (ferncasado) wrote (last edit ): | #51 |
Does it have "CONFIG_UBSAN_TRAP" set in the config?
vladimir2k9 (vladimir2k9) wrote : | #52 |
I'm trying kernel from
https:/
DanglingPointer (ferncasado) wrote : | #53 |
1) Do an $ ls -lah /boot
2) look for config-
3) ctrl+w and type or paste "CONFIG_UBSAN_TRAP" and hit enter. (paste is ctrl+shift+v)
If it doesn't find it, then it is odd that it doesn't work for you. The problem could be elsewhere. That said I don't have that kernel version running.
vladimir2k9 (vladimir2k9) wrote : | #54 |
Does not find it.
Only find by word "UBSAN":
CONFIG_
# CONFIG_UBSAN is not set
My system hangs up on early start. Not response any key. Alt+SysRq+b not work too.
But if I choose Recovery mode, an then continue booting, it boots.
DanglingPointer (ferncasado) wrote : | #55 |
Is it a new or old system?
What's the last working kernel version for you?
Have you considered going back to the official kernel version for your Ubuntu (if it works?)?
Another option you may have would be to try disabling that "...SANITZE_ALL" config and rebuild the kernel from kernel.org using the updated ubuntu config. See if it works.
vladimir2k9 (vladimir2k9) wrote : | #56 |
New system. Ubuntu 21.10
I am on official kernel, but wanted use latest.
Gannet (ken20001) wrote : | #57 |
Still the same with v5.16-rc8.
Claus Lensbøl (cmol) wrote : | #58 |
Had a similar issue on a 20.04 install today after updating to 5.14.
Woke it up from suspend on 5.10.0-1057-oem. Rebooted once (some screen issues that turned out to be the screen), installed updates via update-manager, rebooted using the new 5.14.0-1024-oem kernel and it just hung. Black screen, blinking cursor.
I noticed that the lenovo splash screen did not come up after selecting the 5.14 kernel in grub. It does with 5.10.
Going back to 5.10 works as before. I have a kern.log saved from the before and after the update, but do I need to make sure it is sanitized for secure boot keys and such?
Also, it should be noted that it is a dual GPU system with an intel and nvidia.
Andrew Purtell (apurtell) wrote : | #59 |
Running 5.13.0-xx on Parallels Desktop on M1 Macbook. 5.13.0-12-generic is the last kernel that will boot. -13 and -14 have been withdrawn, apparently. -15 and -16 just hang as soon as control is given to the kernel back from the EFI loader. Some fail code was backported post -12 from more recent kernels it seems.
vladimir2k9 (vladimir2k9) wrote (last edit ): | #60 |
I find why my system not boot.
After some googling and test, i used kernel parameters "amdgpu.aspm=0" and system booted. Seems it now set to auto for default.
aspm
ASPM support (1 = enable, 0 = disable, -1 = auto)
Also booted old kernels from 5.14 . Now i am on 5.17
$ uname -a
Linux vladimir-desktop 5.17.0-
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1942684
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.