NVME devices are not enumerated on Raspberry PI 5 with Ubuntu 23.10

Bug #2052861 reported by Dmytro
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-raspi (Ubuntu)
Fix Released
Undecided
Unassigned
Mantic
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

NVMe devices connected via PCIe are not enumerated on Raspberry Pi 5.

[Fix]

https://github.com/raspberrypi/linux/commit/2677529a4f8a50c7567f50d67f368f1d138fb4d2

[Test Case]

Check whether NVMe devices connected via PCIe are detected with lspci/lsblk.

[Where Problems Could Occur]

PCIe devices might misbehave.

[Original Description]

NVME devices are not enumerated on Raspberry PI 5 with Ubuntu 23.10

I've got one of the PCIe NVME extension HATs for rpi 5 [1] and installed a functioning NVME drive Samsung SM961.
Tried to boot it first with an image of RaspberryPI OS from sd-card. Upon boot, the nvme drive was detected and I could mount it and perform some IO:

> uname -a
Linux raspberrypi 6.1.0-rpi8-rpi-2712 #1 SMP PREEMPT Debian 1:6.1.73-1+rpt1 (2024-01-25) aarch64 GNU/Linux
> lspci -nnk
0000:00:00.0 PCI bridge [0604]: Broadcom Inc. and subsidiaries Device [14e4:2712] (rev 21)
 Kernel driver in use: pcieport
0000:01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961/SM963 [144d:a804]
 Subsystem: Samsung Electronics Co Ltd SM963 2.5" NVMe PCIe SSD [144d:a801]
 Kernel driver in use: nvme
0001:00:00.0 PCI bridge [0604]: Broadcom Inc. and subsidiaries Device [14e4:2712] (rev 21)
 Kernel driver in use: pcieport
0001:01:00.0 Ethernet controller [0200]: Device [1de4:0001]
 Kernel driver in use: rp1

dmesg logs of rpios will be attached, but there is nothing suspicious.

Then I've tried to run Ubuntu 23.10 and reimaged the sd-card with "Ubuntu Server 23.10 (64-bit) (released 2023-10-12).
Upon boot, the drive didn't show up:
> lspci -nnk
00:00.0 PCI bridge [0604]: Broadcom Inc. and subsidiaries Device [14e4:2712] (rev 21)
 Kernel driver in use: pcieport
01:00.0 Ethernet controller [0200]: Device [1de4:0001]
 Kernel driver in use: rp1

Checked for updates and there was a newer kernel available, thus I've upgraded:
> apt update
> apt list --upgradable
...
linux-image-raspi/mantic-updates,mantic-security 6.5.0.1010.11 arm64 [upgradable from: 6.5.0.1005.6]

After reboot, the nvme still didn't show up. Full dmesg log is attached, but in essence the pcie driver is having troubles to enumerate one for devices on the bus:
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x414fd0b1]
[ 0.000000] Linux version 6.5.0-1010-raspi (buildd@bos03-arm64-017) (aarch64-linux-gnu-gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0, GNU ld (GNU Binutils for Ubuntu) 2.41) #13-Ubuntu SMP PREEMPT_DYNAMIC Thu Jan 18 09:08:04 UTC 2024 (Ubuntu 6.5.0-1010.13-raspi 6.5.8)
...
[ 2.612956] brcm-pcie 1000110000.pcie: link down
[ 2.617719] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[ 2.624391] pci 0000:00:00.0: PCI bridge to [bus 01]
...
[ 2.876967] brcm-pcie 1000120000.pcie: link up, 5.0 GT/s PCIe x4 (!SSC)
[ 2.883637] pci 0000:01:00.0: [1de4:0001] type 00 class 0x020000
...

After digging a bit, I found a discussion on the Raspberry's kernel github [2]:
 - a bug was introduced sometime in 6.2.x [3]
 - a fix landed in 6.6.x [4]
 - it boils down to a misused `readw_` instead of `readl_` in the pcie-brcmstb driver

This made me think that given that the upcoming 24.04 is going to be on 6.7 or 6.8 kernel, the fix might be already there.
So I grabbed a build of the "6.7.0-1001.1"[5] kernel from the "Noble Proposed" and installed it on the Mantic on my sd-card.
After reboot, the nvme drive was discovered and proper driver was loaded:

> uname -a
Linux pi 6.7.0-1001-raspi #1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jan 25 12:28:01 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

> lspci -nnk
0000:00:00.0 PCI bridge [0604]: Broadcom Inc. and subsidiaries Device [14e4:2712] (rev 21)
 Kernel driver in use: pcieport
0000:01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961/SM963 [144d:a804]
 Subsystem: Samsung Electronics Co Ltd SM963 2.5" NVMe PCIe SSD [144d:a801]
 Kernel driver in use: nvme
 Kernel modules: nvme
0001:00:00.0 PCI bridge [0604]: Broadcom Inc. and subsidiaries Device [14e4:2712] (rev 21)
 Kernel driver in use: pcieport
0001:01:00.0 Ethernet controller [0200]: Device [1de4:0001]
 Kernel driver in use: rp1

> dmesg
...
[ 2.221266] brcm-pcie 1000110000.pcie: link up, 5.0 GT/s PCIe x1 (!SSC)
[ 2.227947] pci 0000:01:00.0: [144d:a804] type 00 class 0x010802
[ 2.234031] pci 0000:01:00.0: reg 0x10: [mem 0x1b00000000-0x1b00003fff 64bit]
[ 2.241396] pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
...
[ 2.334202] nvme nvme0: pci function 0000:01:00.0
[ 2.338940] nvme 0000:01:00.0: enabling device (0000 -> 0002)

To summarize:
 - RaspberryPI OS, kernel 6.1: OK
 - Ubuntu 23.10, kernel 6.5.0-1010: FAILS
 - Ubuntu 23.10, kernel 6.7.0-1001 from noble: OK

It would be great if the fix[4] for mdio could land 23.10 kernel updates, so NVME drives could be used without waiting for the Noble release later this year.

[1] https://geekworm.com/products/x1002
[2] https://github.com/raspberrypi/linux/issues/5873
[3] https://github.com/raspberrypi/linux/commit/ca5dcc76314d1fa6d7307fd3b95039b08d2f2b97
[4] https://github.com/raspberrypi/linux/commit/2677529a4f8a50c7567f50d67f368f1d138fb4d2
[5]
 - https://launchpad.net/ubuntu/+source/linux-raspi/6.7.0-1001.1
 - http://launchpadlibrarian.net/711099177/linux-image-6.7.0-1001-raspi_6.7.0-1001.1_arm64.deb
 - http://launchpadlibrarian.net/711099172/linux-modules-6.7.0-1001-raspi_6.7.0-1001.1_arm64.deb

Revision history for this message
Dmytro (alkersan) wrote :
Revision history for this message
Dmytro (alkersan) wrote :

dmesg of the vanilla raspberry pi os with their 6.1 kernel

Revision history for this message
Dmytro (alkersan) wrote :
description: updated
Dmytro (alkersan)
description: updated
Manuel Diewald (diewald)
tags: added: kern-9212
Manuel Diewald (diewald)
Changed in linux-raspi (Ubuntu Mantic):
status: New → In Progress
Manuel Diewald (diewald)
description: updated
Juerg Haefliger (juergh)
Changed in linux-raspi (Ubuntu Mantic):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-raspi/6.5.0-1012.15 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-mantic-linux-raspi' to 'verification-done-mantic-linux-raspi'. If the problem still exists, change the tag 'verification-needed-mantic-linux-raspi' to 'verification-failed-mantic-linux-raspi'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-mantic-linux-raspi-v2 verification-needed-mantic-linux-raspi
Revision history for this message
Dmytro (alkersan) wrote :

Thank you!
I've just tested the proposed kernel 6.5.0-1012.15 on rpi 5 device and can confirm that it works:

> uname -a
Linux pi 6.5.0-1012-raspi #15-Ubuntu SMP PREEMPT_DYNAMIC Thu Feb 22 15:23:50 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

> lspci -nnk
0000:00:00.0 PCI bridge [0604]: Broadcom Inc. and subsidiaries Device [14e4:2712] (rev 21)
 Kernel driver in use: pcieport
0000:01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961/SM963 [144d:a804]
 Subsystem: Samsung Electronics Co Ltd SM963 2.5" NVMe PCIe SSD [144d:a801]
 Kernel driver in use: nvme
 Kernel modules: nvme
0001:00:00.0 PCI bridge [0604]: Broadcom Inc. and subsidiaries Device [14e4:2712] (rev 21)
 Kernel driver in use: pcieport
0001:01:00.0 Ethernet controller [0200]: Device [1de4:0001]
 Kernel driver in use: rp1

> dmesg
...
Feb 23 07:47:54 pi kernel: brcm-pcie 1000110000.pcie: link up, 5.0 GT/s PCIe x1 (!SSC)
Feb 23 07:47:54 pi kernel: pci 0000:01:00.0: [144d:a804] type 00 class 0x010802
Feb 23 07:47:54 pi kernel: pci 0000:01:00.0: reg 0x10: [mem 0x1b00000000-0x1b00003fff 64bit]
Feb 23 07:47:54 pi kernel: pci 0000:01:00.0: 4.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x1 link at 0000:00:00.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
...

Juerg Haefliger (juergh)
tags: added: verification-done-mantic-linux-raspi
removed: verification-needed-mantic-linux-raspi
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (122.1 KiB)

This bug was fixed in the package linux-raspi - 6.5.0-1012.15

---------------
linux-raspi (6.5.0-1012.15) mantic; urgency=medium

  * mantic/linux-raspi: 6.5.0-1012.15 -proposed tracker (LP: #2052031)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2024.02.05)

  * NVME devices are not enumerated on Raspberry PI 5 with Ubuntu 23.10
    (LP: #2052861)
    - PCI: brcmstb: fix broken brcm_pcie_mdio_write() polling

  [ Ubuntu: 6.5.0-25.25 ]

  * mantic/linux: 6.5.0-25.25 -proposed tracker (LP: #2052615)
  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2024.02.05)
  * [SRU][22.04.04]: mpi3mr driver update (LP: #2045233)
    - scsi: mpi3mr: Invoke soft reset upon TSU or event ack time out
    - scsi: mpi3mr: Update MPI Headers to version 3.00.28
    - scsi: mpi3mr: Add support for more than 1MB I/O
    - scsi: mpi3mr: WRITE SAME implementation
    - scsi: mpi3mr: Enhance handling of devices removed after controller reset
    - scsi: mpi3mr: Update driver version to 8.5.0.0.0
    - scsi: mpi3mr: Split off bus_reset function from host_reset
    - scsi: mpi3mr: Add support for SAS5116 PCI IDs
    - scsi: mpi3mr: Add PCI checks where SAS5116 diverges from SAS4116
    - scsi: mpi3mr: Increase maximum number of PHYs to 64 from 32
    - scsi: mpi3mr: Add support for status reply descriptor
    - scsi: mpi3mr: driver version upgrade to 8.5.0.0.50
    - scsi: mpi3mr: Refresh sdev queue depth after controller reset
    - scsi: mpi3mr: Clean up block devices post controller reset
    - scsi: mpi3mr: Block PEL Enable Command on Controller Reset and Unrecoverable
      State
    - scsi: mpi3mr: Fetch correct device dev handle for status reply descriptor
    - scsi: mpi3mr: Support for preallocation of SGL BSG data buffers part-1
    - scsi: mpi3mr: Support for preallocation of SGL BSG data buffers part-2
    - scsi: mpi3mr: Support for preallocation of SGL BSG data buffers part-3
    - scsi: mpi3mr: Update driver version to 8.5.1.0.0
  * The display becomes frozen after some time when a HDMI device is connected.
    (LP: #2049027)
    - drm/i915/dmc: Don't enable any pipe DMC events
  * Audio balancing setting doesn't work with the cirrus codec (LP: #2051050)
    - ALSA: hda/cs8409: Suppress vmaster control for Dolphin models
  * partproke is broken on empty loopback device (LP: #2049689)
    - block: Move checking GENHD_FL_NO_PART to bdev_add_partition()
  * CVE-2023-51780
    - atm: Fix Use-After-Free in do_vcc_ioctl
  * CVE-2023-6915
    - ida: Fix crash in ida_free when the bitmap is empty
  * Update Ubuntu.md (LP: #2051176)
    - [Packaging] update Ubuntu.md
  * test_021_aslr_dapper_libs from ubuntu_qrt_kernel_security failed on K-5.19 /
    J-OEM-6.1 / J-6.2 AMD64 (LP: #1983357)
    - [Config]: set ARCH_MMAP_RND_{COMPAT_, }BITS to the maximum
  * Intel E810-XXV - NETDEV WATCHDOG: (ice): transmit queue timed out
    (LP: #2036239)
    - ice: Add driver support for firmware changes for LAG
    - ice: alter feature support check for SRIOV and LAG
  * Mantic update: upstream stable patchset 2024-01-29 (LP: #2051584)
    - Upstream stable to v6.1.67, v6.6....

Changed in linux-raspi (Ubuntu Mantic):
status: Fix Committed → Fix Released
Revision history for this message
Dave Jones (waveform) wrote :

I believe this is also "fixed" (or rather wasn't an issue) with the noble kernel; at least testing the current noble desktop beta image from an SD card shows an NVMe drive attached to a pimoroni base. I'll mark this as Fix Released; please feel free to re-open if this re-appears on the noble images.

Changed in linux-raspi (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.