Cannot boot after updating kernel to version 5.4.0-45

Bug #1894378 reported by Ami
56
This bug affects 10 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

After the kernel update from version 5.4.0-42 to version 5.4.0-45 a few days ago, the boot process is ending in a BusyBox shell. Apparently it cannot properly identify my M.2 NMVe SSD device where my root partition is. Indeed, it is not listed in /dev (precisely: nvme0 is listed in /dev, but nvme0n1 and its partitions are not). For now, I'm solving this problem by booting the previous kernel, 5.4.0-42. This works fine.

Also, an earlier attempt ended in a different shell that presented the error message "Gave up on waiting for root file system device". The UUID it indicated as missing is the correct UUID of my root partition, which also appears on /etc/fstab.

I tried to re-build initrd.img-5.4.0-45-generic, but to no avail.

Let me also mention that in another computer with a non-M.2 SSD, kernel 5.4.0-45 boots properly.

This bug seems to be the one mentioned in the following "Unix & Linux" forum question:
https://unix.stackexchange.com/questions/607694/ubuntu-20-04-01-not-booting-after-kernel-update

Many thanks in advance!

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-42-generic 5.4.0-42.46
ProcVersionSignature: Ubuntu 5.4.0-42.46-generic 5.4.44
Uname: Linux 5.4.0-42-generic x86_64
ApportVersion: 2.20.11-0ubuntu27.8
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ami 3801 F.... pulseaudio
CasperMD5CheckResult: skip
CurrentDesktop: KDE
Date: Sat Sep 5 17:42:26 2020
HibernationDevice: RESUME=UUID=dd70dd06-717a-4acf-a85e-a9ad37cb92ea
InstallationDate: Installed on 2017-10-04 (1066 days ago)
InstallationMedia: Kubuntu 16.04.3 LTS "Xenial Xerus" - Release amd64 (20170801)
IwConfig:
 enp0s31f6 no wireless extensions.

 lo no wireless extensions.
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 002: ID 1532:022a Razer USA, Ltd Razer Cynosa Chroma
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Lsusb-t:
 /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 5000M
 /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 480M
     |__ Port 8: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 12M
     |__ Port 8: Dev 2, If 1, Class=Human Interface Device, Driver=usbhid, 12M
     |__ Port 8: Dev 2, If 2, Class=Human Interface Device, Driver=usbhid, 12M
MachineType: System manufacturer System Product Name
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-42-generic root=UUID=85262f0d-cb92-475a-8174-2d6316e199a9 ro quiet splash
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-42-generic N/A
 linux-backports-modules-5.4.0-42-generic N/A
 linux-firmware 1.187.3
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to focal on 2020-05-03 (125 days ago)
dmi.bios.date: 06/20/2018
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 3807
dmi.board.asset.tag: Default string
dmi.board.name: H110M-A/M.2
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3807:bd06/20/2018:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnH110M-A/M.2:rvrRevX.0x:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: System Product Name
dmi.product.sku: SKU
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Revision history for this message
Ami (amiamiami) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Ami (amiamiami)
description: updated
Ami (amiamiami)
information type: Public → Private Security
information type: Private Security → Public
Revision history for this message
pingchou (pingchou) wrote :

5.4.0-45 5.4.0-47 encounters the same problem, you must use 5.4.0-42 to boot

Ami (amiamiami)
description: updated
Revision history for this message
Achille Fouilleul (achille-fouilleul) wrote :

Same problem for me, 5.4.0-45 and -47 kernels fail to detect my nvme drive while -42 boots fine.
According to git bisect the issue was introduced by commit e958cbb11ed6f58dac5bbfba481631ff0f567c7d (upstream ea43d9709f727e728e933a8157a7a7ca1a868281).

Revision history for this message
crysman (crysman) wrote :

At work, we got quite recently one machine hanging on "ALERT! ... UUID does not exist ..." with kernels 5.4.0-45 and 5.4.0-47. Only 5.4.0-42 is booting now.

Adding `rootdelay=5` does not help.
I've already updated BIOS to latest version (from 2020-07), does not help.

The machine is also using NVMe M2 SSD. It is miniPC `Intel NUC8BEH`.

Please:
1) how to workaround this in order to let the user actually use the machine without manually selecting "Advanced boot options"?
2) how to fix this properly?

Thanks

Revision history for this message
crysman (crysman) wrote :

PS: I've noticed several nvme related info on boot screen:

`nvme nvme0: missing or invalid SUBNQN field.`
...
and after 3 lines there is:

`nvme nvme0: Identify descriptors failed (2)`

could it be related?

Revision history for this message
crysman (crysman) wrote :

no, this is improbable to be related because when 5.4.0-42 is booted, these messages are there id `dmesg`, too.

I've made this workaround so far:
1) sudo nano /etc/default/grub
* GRUB_TIMEOUT=5
* GRUB_DEFAULT=saved
* GRUB_SAVEDEFAULT=true
2) sudo update-grub

so booting 5.4.0-42 now and waiting for further info... Thanks

Revision history for this message
Achille Fouilleul (achille-fouilleul) wrote :

@crysman I have a NUC too. Is your nvme controller a Silicon Motion, Inc. Device 2263 (rev 03) (126f:2263)?
It seems this controller somehow deviates from the specification.
The commit I mentioned above (ea43d9709f) made the driver stricter, causing it to reject the device.
A fix is available upstream: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5bedd3afee8eb01ccd256f0cd2cc0fa6f841417a
I can see it in Ubuntu's master-next branch: https://kernel.ubuntu.com/git/ubuntu/ubuntu-focal.git/commit/?h=master-next&id=5ba7812ddae2589c0a4c1ded468ce49de35774d4
Hopefully next time an Ubuntu kernel is released the problem will go away.

Revision history for this message
Ami (amiamiami) wrote :

Thanks for your comments, @crysman and @achille-fouilleul.
I'm glad there is an upcoming fix, and hope it is released before long.

Revision history for this message
crysman (crysman) wrote :

@Achille
```
$ sudo lspci | grep Motion
6d:00.0 Non-Volatile memory controller: Silicon Motion, Inc. Device 2263 (rev 03)

```

according to `sudo nvme list` output, it is /dev/nvme0n1 Model TS256GMTE110S

If I understood correctly, next update will fix the issue (?)

Revision history for this message
Ami (amiamiami) wrote :

The new kernel, 5.4.0-48, indeed solves the problem.
Thanks for all the comments!

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Disctanger (disctanger) wrote :

My pc also could not find/see my NVME device after ubuntu update. As a result it could not boot to the system.

The following is the Nvme storage:

```
~$ sudo nvme list

Node Model
---------------- --------------------
/dev/nvme0n1 INTEL SSDPEKKW512G8
```

Problem has been resolved with 5.4.0-48-generic kernel update.
Thanks to the author for creating this bug report and others for contributing!!!

Revision history for this message
Idan Shinberg (ishinberg0) wrote :

Sorry to be a buzz kill but this bug started affecting **after** upgrading to 5.4.0-48.
5.4.0-47 works fine. I'm running on top of Dell XPS 9700.

Any ideas/help are much advised.

Revision history for this message
Achille Fouilleul (achille-fouilleul) wrote :

@ishinberg0

The issue ami, crysman and I had was related to a specific NVMe controller (PCI IDs 126f:2263). 5.4.0-48 fixed it for ami and I at least. I guess your issue is different; I suggest you file a new bug report with details about your system (dmesg, lspci output etc.).

Revision history for this message
Ami (amiamiami) wrote :

@ishinberg0, as achille-fouilleul wrote, this is probably a different bug.
A few questions you can ask to make sure of that:

- Do you get any error message at all?
- Which nvme items do you see in /dev when in BusyBox, and which ones when you boot normally using kernels that work for you?
- Do you see the problematic controller when running lspci (see crysman's comment)?

Good luck!

Revision history for this message
Brendan DeBeasi (b-9) wrote :

@ishinberg0 I am having the same issue on an alienware aurora r6 after the upgrade to 5.4.0-48

I would like to file a new bug report for this. How do I go about generating any debug info when I have a blank screen after grub menu?

Revision history for this message
Achille Fouilleul (achille-fouilleul) wrote :

@b-9

There may be various options, depending on how comfortable you are with the command line.
You could try to build a custom initramfs image with diagnostic tools, boot from it and dump the necessary info to a USB drive for example. Or clone the ubuntu kernel git repository and use git bisect to pinpoint the commit that introduced the issue.

Definitely easier: boot your most recent working kernel normally and dump the requested information (lspci, dmesg etc.). Though the kernel version will not match exactly the information may help narrow down the issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.