Creating bcache backing device using a Dell Ent NVMe CM6 MU 6.4TB storage fails with "cannot allocate memory" error

Bug #2016040 reported by Gokhan Cetinkaya
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Rafael Lopez

Bug Description

# make-bcache -B /dev/nvme4n1
UUID: 569e90b3-cc2a-4a0e-a201-476c426a2141
Set UUID: d8d8458a-01df-481e-a7dc-e75843a9608f
version: 1
block_size: 1
data_offset: 16

kern.log:
Apr 12 18:21:56 ... kernel: [ 723.854659] bcache: register_bdev() error nvme4n1: cannot allocate memory
Apr 12 18:21:56 ... kernel: [ 723.854662] bcache: register_bdev_worker() error /dev/nvme4n1: fail to register backing device

Same error message is mentioned in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1909518, but the suggested configuration change does not fix the issue:

# nvme id-ns /dev/nvme4n1 -n 1 -H |grep "LBA Format"
  [3:0] : 0x3 Current LBA Format Selected
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0 Best
LBA Format 1 : Metadata Size: 8 bytes - Data Size: 512 bytes - Relative Performance: 0 Best
LBA Format 2 : Metadata Size: 0 bytes - Data Size: 1 bytes - Relative Performance: 0 Best
LBA Format 3 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best (in use)
LBA Format 4 : Metadata Size: 8 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best
LBA Format 5 : Metadata Size: 64 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best

Using a Dell Ent NVMe P5800x WI U.2 400GB storage works as expected:

# make-bcache -B /dev/nvme2n1
UUID: b1405a6d-8732-4175-8aba-d67be23b3ff0
Set UUID: af6b190f-a57e-4f2b-926e-904a95268390
version: 1
block_size: 8
data_offset: 16

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
...
nvme2n1 259:1 0 372.6G 0 disk
└─bcache0 252:0 0 372.6G 0 disk
...

kern.log:
Apr 12 18:47:33 ... kernel: [ 2261.135332] bcache: register_bdev() registered backing device nvme2n1

Kernel versions tested:
5.19.0-38-generic #39~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC
5.15.0-69-generic #76-Ubuntu SMP
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Apr 12 18:10 seq
 crw-rw---- 1 root audio 116, 33 Apr 12 18:10 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu82.3
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: unknown
DistroRelease: Ubuntu 22.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: Dell Inc. PowerEdge R7525
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.19.0-38-generic root=UUID=3c0a9c0d-3f5d-4d4d-a571-37298c70a1a3 ro
ProcVersionSignature: Ubuntu 5.19.0-38.39~22.04.1-generic 5.19.17
RelatedPackageVersions:
 linux-restricted-modules-5.19.0-38-generic N/A
 linux-backports-modules-5.19.0-38-generic N/A
 linux-firmware 20220329.git681281e4-0ubuntu3.12
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: jammy uec-images
Uname: Linux 5.19.0-38-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 10/25/2022
dmi.bios.release: 2.10
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.10.2
dmi.board.name: 03WYW4
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.10.2:bd10/25/2022:br2.10:svnDellInc.:pnPowerEdgeR7525:pvr:rvnDellInc.:rn03WYW4:rvrA01:cvnDellInc.:ct23:cvr:skuSKU=08FF;ModelName=PowerEdgeR7525:
dmi.product.family: PowerEdge
dmi.product.name: PowerEdge R7525
dmi.product.sku: SKU=08FF;ModelName=PowerEdge R7525
dmi.sys.vendor: Dell Inc.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2016040

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote : CurrentDmesg.txt

apport information

tags: added: apport-collected jammy uec-images
description: updated
Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote : Lspci.txt

apport information

Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote : Lspci-vt.txt

apport information

Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote : Lsusb.txt

apport information

Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote : Lsusb-t.txt

apport information

Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote : Lsusb-v.txt

apport information

Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote : ProcModules.txt

apport information

Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote : UdevDb.txt

apport information

Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote : WifiSyslog.txt

apport information

Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote : acpidump.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote :

# lspci -vv |grep NVMe
c1:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller Cx6 (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Dell Dell Ent NVMe CM6 MU 6.4TB
                        [PN] Part number: Dell Ent NVMe CM6 MU 6.4TB
c2:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller Cx6 (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Dell Dell Ent NVMe CM6 MU 6.4TB
                        [PN] Part number: Dell Ent NVMe CM6 MU 6.4TB
c3:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [Optane] (prog-if 02 [NVM Express])
        Subsystem: Dell NVMe Datacenter SSD [Optane] 400GB 2.5" U.2 (P5800X)
c4:00.0 Non-Volatile memory controller: Intel Corporation NVMe Datacenter SSD [Optane] (prog-if 02 [NVM Express])
        Subsystem: Dell NVMe Datacenter SSD [Optane] 400GB 2.5" U.2 (P5800X)
c7:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller Cx6 (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Dell Dell Ent NVMe CM6 MU 6.4TB
                        [PN] Part number: Dell Ent NVMe CM6 MU 6.4TB
c8:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller Cx6 (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Dell Dell Ent NVMe CM6 MU 6.4TB
                        [PN] Part number: Dell Ent NVMe CM6 MU 6.4TB

Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote :
Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote :

Works as expected with Ubuntu 20.04, kernel 5.4.0-146-generic

# make-bcache -B /dev/nvme4n1
UUID: c2422b08-d969-4d6c-8922-f35509719146
Set UUID: 035df7a9-13b8-4309-83d9-65263e95b338
version: 1
block_size: 1
data_offset: 16

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
...
nvme4n1 259:6 0 5.8T 0 disk
└─bcache0 252:0 0 5.8T 0 disk
...

kern.log:
Apr 13 00:39:56 ... kernel: [ 718.192424] bcache: register_bdev() registered backing device nvme4n1

# lsb_release -r
Release: 20.04

# uname -r
5.4.0-146-generic

description: updated
Revision history for this message
Gokhan Cetinkaya (gokhancetinkaya) wrote :

Same error with Ubuntu 20.04, kernel 5.15.0-69-generic

Changed in linux (Ubuntu):
assignee: nobody → Rafael Lopez (rafael.lopez)
importance: Undecided → Medium
Revision history for this message
Rafael Lopez (rafael.lopez) wrote :

The failure occurs as a result of a new allocation check (for kvmalloc_node) that was added between 5.9 and 5.15+
https://github.com/torvalds/linux/commit/7661809d493b426e979f39ab512e3adf41fbcc69

It also requires two conditions:
1. The drive presents an optimal IO size > 0 (checked at /sys/block/{drive}/queue/optimal_io_size)
2. The drive is large. This will vary depending on sectors and the optimal io size, but assuming 512b sectors, and optimal io size 4096, drives larger than 2TiB would fail.

Should be fixed as part of this upstream patchset:
https://<email address hidden>/T/#u

Revision history for this message
Andre Ruiz (andre-ruiz) wrote :

Are we gonna backport this to LTS kernel for jammy? Any ETA?

Thank you

Revision history for this message
Rafael Lopez (rafael.lopez) wrote :

Yes, will look to backport after it is merged upstream. No ETA for now, but hoping the upstream merge will happen in the next week or two based on response from Coly Li [1].

[1]https://lore<email address hidden>/

Revision history for this message
Noah Mehl (noahmehl) wrote :

Forgive my ignorance on this, but how can I help move this forward?

Revision history for this message
Rafael Lopez (rafael.lopez) wrote (last edit ):

It seems the patches were never merged upstream. However there is a recent patch that is much simpler and appears to directly address this:

https://<email address hidden>/

It is already in master and the latest 6.7 release candidate (RC5) so we should be able to SRU it.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.7-rc5&id=baf8fb7e0e5ec54ea0839f0c534f2cdcd79bea9c

Revision history for this message
Rafael Lopez (rafael.lopez) wrote :

@noahmehl do you have an environment you can test with? If you can confirm the patch I linked works, I can work towards getting into the GA ubuntu kernels.

I built test packages including the patch based on latest ubuntu 22.04 kernels (regular and hwe) here:
https://launchpad.net/~rafael.lopez/+archive/ubuntu/bcache-lp2016040

You can try these by adding repo:
sudo add-apt-repository ppa:rafael.lopez/bcache-lp2016040

Regular:
sudo apt install linux-image-unsigned-5.15.0-91-generic

HWE:
sudo apt install linux-image-unsigned-6.2.0-39-generic

NOTE - since these are unsigned test kernels, you may need to disable secure boot to be able to boot them. I also strongly discourage using these packages in a production setting or critical environment, it is strictly for test purposes.

Revision history for this message
Noah Mehl (noahmehl) wrote :

@Rafael.lopez,

Many apologies, I did not see this reply. I will test immediately and let you know.

Thanks for doing this!

~Noah

Revision history for this message
Noah Mehl (noahmehl) wrote :

@rafael.lopez,

Turns out I cannot test this. They system I have is Debian 11.9 and dpkg doesn't support zstd :(

Revision history for this message
Noah Mehl (noahmehl) wrote :

@rafael.lopez,

Sorry, hopefully last thing: can you actually back port this to 6.5? I should be able to test that on another system....

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.