NVDIMM-N doesn't work properly on Dell EMC PowerEdge R840

Bug #1811785 reported by JulietDeltaGolf
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ndctl
Fix Released
Unknown
linux (Ubuntu)
Fix Released
Undecided
Unassigned
ndctl (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Incomplete
Medium
Unassigned
Disco
Won't Fix
Medium
Unassigned
Eoan
Fix Released
Undecided
Unassigned

Bug Description

On ubuntu 18.04.1 :

1°) With 4.15.0-43 two default namespaces in raw mode are visible while there shouldn't be any but no settings change could be applied.

root@server:~# ndctl list -R
[
  {
    "dev":"region1",
    "size":103079215104,
    "available_size":0,
    "type":"pmem",
    "numa_node":1,
    "persistence_domain":"unknown"
  },
  {
    "dev":"region0",
    "size":103079215104,
    "available_size":0,
    "type":"pmem",
    "numa_node":0,
    "persistence_domain":"unknown"
  }
]

root@server:~# ndctl list
[
  {
    "dev":"namespace1.0",
    "mode":"raw",
    "size":103079215104,
    "sector_size":512,
    "blockdev":"pmem1",
    "numa_node":1
  },
  {
    "dev":"namespace0.0",
    "mode":"raw",
    "size":103079215104,
    "sector_size":512,
    "blockdev":"pmem0",
    "numa_node":0
  }
]

root@server:~# ndctl create-namespace --reconfig=namespace1.0 --type=pmem --mode=sector -f
libndctl: sizeof_namespace_index: nmem10: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem10: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem10: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem10: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem10: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem10: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem10: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem10: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem10: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem10: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem7: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem7: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem7: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem7: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem7: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem7: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem7: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem7: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem7: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem7: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem9: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem9: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem9: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem9: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem9: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem9: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem9: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem9: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem9: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem9: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem6: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem6: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem6: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem6: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem6: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem6: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem6: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem6: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem6: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem6: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem11: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem11: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem11: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem11: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem11: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem11: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem11: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem11: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem11: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem11: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem8: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem8: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem8: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem8: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem8: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem8: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem8: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem8: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem8: label area (1024) too small to host (256 byte) labels
libndctl: sizeof_namespace_index: nmem8: label area (1024) too small to host (256 byte) labels
failed to reconfigure namespace: No such device

2°) With 4.18.0-13 no active namespaces are seen which seems more coherent for non initialized NVDIMM-N with labels but no namespace can be created.

Both patches (kernel & ndctl) listed in the github issue below needs to be applied in order to get them working properly. Tested against the current hwe 4.18 kernel.

https://github.com/pmem/ndctl/issues/78
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jan 15 10:55 seq
 crw-rw---- 1 root audio 116, 33 Jan 15 10:55 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
MachineType: Dell Inc. PowerEdge R840
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: ndctl 61.2-0ubuntu1~18.04.1
PackageArchitecture: amd64
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 mgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/ROOT/ubuntu@/boot/vmlinuz-4.15.0-43-generic root=ZFS=rpool/ROOT/ubuntu ro
ProcVersionSignature: Ubuntu 4.15.0-43.46-generic 4.15.18
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-43-generic N/A
 linux-backports-modules-4.15.0-43-generic N/A
 linux-firmware 1.173.2
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic
Uname: Linux 4.15.0-43-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 11/21/2018
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.3.9
dmi.board.name: 08XR9M
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.3.9:bd11/21/2018:svnDellInc.:pnPowerEdgeR840:pvr:rvnDellInc.:rn08XR9M:rvrA01:cvnDellInc.:ct23:cvr:
dmi.product.family: PowerEdge
dmi.product.name: PowerEdge R840
dmi.sys.vendor: Dell Inc.

Related branches

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1811785

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
JulietDeltaGolf (julietdeltagolf) wrote : CRDA.txt

apport information

information type: Public → Public Security
information type: Public Security → Private Security
information type: Private Security → Public Security
information type: Public Security → Public
tags: added: apport-collected bionic
description: updated
Revision history for this message
JulietDeltaGolf (julietdeltagolf) wrote : CurrentDmesg.txt

apport information

Revision history for this message
JulietDeltaGolf (julietdeltagolf) wrote : Dependencies.txt

apport information

Revision history for this message
JulietDeltaGolf (julietdeltagolf) wrote : Lspci.txt

apport information

Revision history for this message
JulietDeltaGolf (julietdeltagolf) wrote : Lsusb.txt

apport information

Revision history for this message
JulietDeltaGolf (julietdeltagolf) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
JulietDeltaGolf (julietdeltagolf) wrote : ProcInterrupts.txt

apport information

Revision history for this message
JulietDeltaGolf (julietdeltagolf) wrote : ProcModules.txt

apport information

Revision history for this message
JulietDeltaGolf (julietdeltagolf) wrote : UdevDb.txt

apport information

Revision history for this message
JulietDeltaGolf (julietdeltagolf) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in ndctl:
status: Unknown → New
Revision history for this message
JulietDeltaGolf (julietdeltagolf) wrote :

In case anybody else runs into this issue, here's some things you could do :

On 4.15, you can reconfigure the namespace with ndctl by adding '--no-autolabel' to the command line. You'll still have to find a way to make the namespaces configuration persistent across boots since you're not making uses of the labels.

On 4.18, if you use a fixed kernel + fixed ndctl to setup the namespace(s), the configuration will be properly read at boot by the current linux hwe 4.18 kernel.

Changed in ndctl:
status: New → Fix Released
Revision history for this message
Jerry Clement (jerry-clement) wrote :

Missing Kernel Fix?

Revision history for this message
JulietDeltaGolf (julietdeltagolf) wrote :

Everything is explained in the ndctl issue 78.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ndctl (Ubuntu):
status: New → Confirmed
Revision history for this message
torel (torehl) wrote :

Needed --no-autolabel to get it working. New ndctl didn't help. Which is nice as I didn't have to interrupt metadata operations of the servers with nvdimm's.

# uname -ar
Linux localhost 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
# ndctl create-namespace --no-autolabel --mode fsdax --map dev -e namespace0.0 -f -vvv
# ndctl create-namespace --no-autolabel --mode fsdax --map dev -e namespace1.0 -f -vvv
# mkfs.xfs -f -m reflink=0 -L PMEM0 /dev/pmem0
# mkfs.ext4 -L PMEM1 -F /dev/pmem1
# mount -o dax /dev/pmem1 /mnt/pmem1-ext4/
# mount -o dax /dev/pmem0 /mnt/pmem0-xfs
# mount | grep dax
/dev/pmem0 on /mnt/pmem0-xfs type xfs (rw,relatime,attr2,dax,inode64,noquota)
/dev/pmem1 on /mnt/pmem1-ext4 type ext4 (rw,relatime,dax,data=ordered)

Nice if this could be fixed in kernels above 4.15.0-62. Thx!

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

libndctl: sizeof_namespace_index: nmem10: label area (1024) too small to host (256 byte) labels

This issue is fixed by:

commit 9c6aae5
Author: Dan Williams <email address hidden>
Date: Tue Jan 15 01:51:45 2019

    ndctl/init-labels: Fix label slot accounting per UEFI 2.7

    Quoting from Linux kernel commit 9e694d9c18dd "libnvdimm, label: change
    nvdimm_num_label_slots per UEFI 2.7":

        sizeof_namespace_index() fails when NVDIMM devices have the minimum
        1024 bytes label storage area. nvdimm_num_label_slots() returns 3
        slots while the area is only big enough for 2 slots.

        Change nvdimm_num_label_slots() to calculate a number of label slots
        according to UEFI 2.7 spec.

    Without this fix attempts to initialize labels on a small (1K) label
    area results in the following:

    libndctl: sizeof_namespace_index: nmem2: label area (1024) too small to host (128 byte) labels
    libndctl: sizeof_namespace_index: nmem2: label area (1024) too small to host (256 byte) labels

    Based on an original patch by Toshi Kani
    Fixes: bdaec95463ca ("ndctl: introduce ndctl_dimm_{validate_labels,init_labels}")
    Reported-by: Sujith Pandel <email address hidden>
    Link: https://github.com/pmem/ndctl/issues/78
    Signed-off-by: Dan Williams <email address hidden>
    Signed-off-by: Vishal Verma <email address hidden>

Upstream.

Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in ndctl (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

It fixes an issue introduced by :

Patch: bdaec95463ca

In version v58

And the fix (9c6aae5) was introduced in v64.

$ rmadison ndctl

 ndctl | 61.2-0ubuntu1~18.04.1 | bionic-updates/universe | source, amd64
 ndctl | 63-1.3 | disco/universe | source, amd64, arm64, armhf, i386, ppc64el, s390x
 ndctl | 65-1 | eoan/universe | source, amd64, arm64, armhf, i386, ppc64el, s390x
 ndctl | 67-1 | focal/universe | source, amd64, arm64, armhf, i386, ppc64el, s390x

Bionic and Disco are affected.

no longer affects: linux (Ubuntu Bionic)
no longer affects: linux (Ubuntu Disco)
no longer affects: linux (Ubuntu Eoan)
Changed in ndctl (Ubuntu):
status: Confirmed → Fix Released
Changed in ndctl (Ubuntu Eoan):
status: New → Fix Released
Changed in ndctl (Ubuntu Disco):
status: New → Confirmed
Changed in ndctl (Ubuntu Bionic):
status: New → Confirmed
Changed in ndctl (Ubuntu):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in ndctl (Ubuntu Bionic):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in ndctl (Ubuntu Disco):
assignee: nobody → Rafael David Tinoco (rafaeldtinoco)
Changed in ndctl (Ubuntu):
importance: Medium → Undecided
Changed in ndctl (Ubuntu Disco):
importance: Undecided → Medium
Changed in ndctl (Ubuntu Bionic):
importance: Undecided → Medium
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

A PPA containing the fixes (Disco and Bionic) will be available here:

https://launchpad.net/~rafaeldtinoco/+archive/ubuntu/lp1811785

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Torel and Juliet,

If possible, could you please test the PPA above with either Bionic and/or Disco and make sure the packages fix the issue ? I'm preparing bdctl to be in "main" pocket for 20.04:

https://bugs.launchpad.net/ubuntu/+source/ndctl/+bug/1853506

and having this case fix would help to get traction in the MIR (main inclusion request).

Thank you very much for reporting this and contributing to Ubuntu.

Changed in ndctl (Ubuntu Bionic):
status: Confirmed → In Progress
Changed in ndctl (Ubuntu Disco):
status: Confirmed → In Progress
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

The kernel issue pointed at the github was solved by:

commit cbaee3aa0e23
Author: Dan Williams <email address hidden>
Date: Thu Feb 7 20:56:50 2019

    acpi/nfit: Fix bus command validation

    BugLink: https://bugs.launchpad.net/bugs/1821607

in kernel: Ubuntu-5.0.0-9.10 (Disco)

AND

commit 3b8d3410b3d8
Author: Dan Williams <email address hidden>
Date: Thu Feb 7 20:56:50 2019

    acpi/nfit: Fix bus command validation

    BugLink: https://bugs.launchpad.net/bugs/1837952

    commit ebe9f6f19d80d8978d16078dff3d5bd93ad8d102 upstream.

in kernel: Ubuntu-4.15.0-59.66 (Bionic)

So kernel part is good.... (or kind of.. actually it is missing 2 fixes on top of that):

----

TODO: KERNEL TEAM SHOULD BACKPORT THIS TO BIONIC AND DISCO:

commit ebe9f6f19d80
Author: Dan Williams <email address hidden>
Date: Thu Feb 7 20:56:50 2019

    acpi/nfit: Fix bus command validation

    Fixes: 11189c1089da ("acpi/nfit: Fix command-supported detection")

commit 0171b6b78131
Author: Dan Williams <email address hidden>
Date: Sun Feb 3 17:17:27 2019

    acpi/nfit: Require opt-in for read-only label configurations

    Fixes: 11189c1089da ("acpi/nfit: Fix command-supported detection")

TO FIX THE ISSUE'S FIX

-----

TODO: FEEDBACK ABOUT PROPOSED USERLAND FIX

and the userland fix was, indeed, the ndctl patch I did a SRU for.

UNFORTUNATELY I can't reproduce the issue with QEMU. I have tried different label sizes in multiple attempts and wasn't able to cause the labeling error (making sure it was fixed). With that it is hard for me to guarantee the fix is good and I'll depend on a feedback from reporter.

There is a PPA in comment #19 containing the fix. I'll flag this case as incomplete while I have no feedback, as we wouldn't be able to verify the fix in order for it to land -updates archive.

Nevertheless, documentation here might be important to others, in order to understand the errors.

Changed in linux (Ubuntu):
status: Invalid → Fix Released
Changed in ndctl (Ubuntu Bionic):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in ndctl (Ubuntu Disco):
assignee: Rafael David Tinoco (rafaeldtinoco) → nobody
Changed in ndctl (Ubuntu Bionic):
status: In Progress → Incomplete
Changed in ndctl (Ubuntu Disco):
status: In Progress → Incomplete
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

TODOs in this bug:

- End user must confirm the issue is still reproducible and test PPA package
- Mark the case as Confirmed again (instead of Incomplete)
- From that point, the SRU would continue (merge requests already present in this bug)
- Patches would be sent out to kernel team for inclusion in the next SRU

Revision history for this message
torel (torehl) wrote :

Tested latest kernel 4.15.0-72-generic on bionic.

root@srl-mds1:~# dpkg -l | grep -i ndctl
ii libndctl6 61.2-0ubuntu1~18.04.1 amd64 Utility library for managing the libnvdimm subsystem
ii ndctl 61.2-0ubuntu1~18.04.1 amd64 Utility for managing the nvdimm subsystem

root@srl-mds1:~# ndctl create-namespace -e "namespace0.0" -m fsdax -f -vvv
enable_labels:945: region0: failed to initialize labels
namespace_reconfig:977: region0: no idle namespace seed
failed to reconfigure namespace: No such device
root@srl-mds1:~#

But if I add --no-autolabel it works

root@srl-mds1:~# ndctl create-namespace --no-autolabel --mode fsdax --map dev -e namespace0.0 -f -vvv
{
  "dev":"namespace0.0",
  "mode":"fsdax",
  "map":"dev",
  "size":"189.00 GiB (202.94 GB)",
  "uuid":"4c9523f5-8442-4356-9133-0e48c88c8d8f",
  "sector_size":512,
  "blockdev":"pmem0",
  "numa_node":0
}
root@srl-mds1:~# ndctl create-namespace --no-autolabel --mode fsdax --map dev -e namespace1.0 -f -vvv
{
  "dev":"namespace1.0",
  "mode":"fsdax",
  "map":"dev",
  "size":"189.00 GiB (202.94 GB)",
  "uuid":"2a16dd16-df10-4737-abb5-060266070d7e",
  "sector_size":512,
  "blockdev":"pmem1",
  "numa_node":1
}

Bionic LTS kernel still needs kernel fix.

Steve Langasek (vorlon)
Changed in ndctl (Ubuntu Disco):
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.