/etc.nfs.conf fails for nfsv4 server / blkmapd dumps core

Bug #1979885 reported by Harald Rudell
30
This bug affects 3 people
Affects Status Importance Assigned to Milestone
nfs-utils (Debian)
Fix Released
Unknown
nfs-utils (Ubuntu)
Fix Released
Undecided
Andreas Hasenack
Jammy
Fix Released
Undecided
Andreas Hasenack
Kinetic
Fix Released
Undecided
Andreas Hasenack

Bug Description

[ Impact ]

Under certain conditions, blkmapd can crash due to calling free() on a pointer that wasn't malloc()ed. The reproducer went as far as isolating it to having LVM Logical Volumes on SCSI disks, but the code flaw is clear.

The struct bl_serial *serial structure is allocated via bl_create_scsi_string() which does a malloc for it, but the code later on was doing a free() on the data element of this structure and only then on the structure itself. That first free() is incorrect, as the data element was never malloc()ed separatedly.

This was first brought up by lixiaokeng via https://www.spinics.net/lists/linux-nfs/msg87598.html, but not acknowledged back then. The patch selected for this SRU is slightly simpler and more suited for an SRU.

[ Test Plan ]

Create a VM for the ubuntu release under test. What's important is that this VM has a SCSI device, not VIRTIO. You can add one after the VM is created, as it must not be the root disk because we will use it as an LVM volume group, i.e., all data on it will be erased.

You may have to install the kernel extra modules package for the scsi device to appear:

sudo apt install linux-modules-extra-$(uname -r)

After a reboot, locate the scsi device. In this example, we will use /dev/sda.

Partition it:
sudo sgdisk -Z /dev/sda

Create an LVM group and volume:
sudo pvcreate /dev/sda
sudo vgcreate vg0 /dev/sda
sudo lvcreate -ntest -L100M vg0

Install nfs-kernel-server:
sudo apt install nfs-kernel-server

The status of the nfs-blkmap service should already show a failure:
systemctl status nfs-blkmap.service
...
Oct 20 18:12:12 j-blkmapd-crash systemd[1]: nfs-blkmap.service: Main process exited, code=dumped, status=6/ABRT
Oct 20 18:12:12 j-blkmapd-crash systemd[1]: nfs-blkmap.service: Failed with result 'core-dump'.

To confirm, run it interactively:
$ sudo blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
double free or corruption (out)
Aborted

With the fixed packages, it should be running after install. It can also be tried out interactively again just to be sure:

sudo systemctl stop nfs-blkmap
sudo blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory

The failure to open the blocklayout file is not a problem in this case, and is unrelated to the bug this SRU is fixing.

[ Where problems could occur ]
Restarting an NFS server can be tricky: connected clients might experience a "blip" in the service, or even hang in the worst case. Also depending on the NFS version being served (3 or 4), multiple services are involved, and the restart can expose a bug in the ordering in which these services are stopped and come back online.

In terms of the patch and code, it's C code dealing with pointers and memory allocation. Things can easily go wrong here, and since this is a daemon, memory leaks can have bigger consequences.

[ Other Info ]
I didn't continue the investigation about other scenarios where this could be happening, or why it did not happen with a VIRTIO device, as the SCSI case was enough to reproduce the problem and show where the bug was.

The previous SRU for nfs-utils (https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1977745) was stopped by phasing because it detected (https://errors.ubuntu.com/?release=Ubuntu%2022.04&package=nfs-utils&period=week&version=1%3A2.6.1-1ubuntu1.1) the crash from this bug here during the restart of blkmapd.

[Original Description]

When using the 22.04 /etc/nfs.conf an nfsv4 server fails to operate

It kind of works but some clients fail and try nfsv3 ports

symptoms:

on boot:
× nfs-blkmap.service - pNFS block layout mapping daemon
     Loaded: loaded (/lib/systemd/system/nfs-blkmap.service; enabled; vendor preset: enabled)
     Active: failed (Result: core-dump) since Sat 2022-06-25 07:14:34 PDT; 27min ago
journalctl --catalog --pager-end --unit=nfs-blkmap.service
Jun 25 07:14:34 c68z blkmapd[2386154]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory

on systemctl restart nfs-server.service:
○ rpc-svcgssd.service - RPC security service for NFS server
     Loaded: loaded (/lib/systemd/system/rpc-svcgssd.service; static)
     Active: inactive (dead) since Fri 2022-06-24 19:07:31 PDT; 12h ago

after boot it was:
● rpc-svcgssd.service - RPC security service for NFS server
     Loaded: loaded (/lib/systemd/system/rpc-svcgssd.service; static)
     Active: active (running) since Sat 2022-06-25 08:27:27 PDT; 2min 7s ago

Some clients tries to access port 111 which is not used by nfs4 on the network

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-image-5.15.0-40-generic 5.15.0-40.43
ProcVersionSignature: Ubuntu 5.15.0-40.43-generic 5.15.35
Uname: Linux 5.15.0-40-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl icp
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D2', '/dev/snd/pcmC0D10p', '/dev/snd/pcmC0D9p', '/dev/snd/pcmC0D8p', '/dev/snd/pcmC0D7p', '/dev/snd/pcmC0D3p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: unknown
Date: Sat Jun 25 08:37:48 2022
HibernationDevice: RESUME=none
MachineType: Apple Inc. Macmini8,1
ProcEnviron:
 SHELL=/bin/bash
 LANG=en_US.UTF-8
 TERM=screen
 PATH=(custom, no user)
ProcFB: 0 i915drmfb
ProcKernelCmdLine: root=ZFS=rpool/ROOT/ubuntu_mc4at7 ro initrd=EFI\hostname\initrd.img
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-5.15.0-40-generic N/A
 linux-backports-modules-5.15.0-40-generic N/A
 linux-firmware 20220329.git681281e4-0ubuntu3.2
RfKill:
 0: hci0: Bluetooth
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/24/2022
dmi.bios.release: 0.1
dmi.bios.vendor: Apple Inc.
dmi.bios.version: 1731.120.10.0.0 (iBridge: 19.16.15071.0.0,0)
dmi.board.name: Mac-7BA5B2DFE22DDD8C
dmi.board.vendor: Apple Inc.
dmi.board.version: Macmini8,1
dmi.chassis.type: 9
dmi.chassis.vendor: Apple Inc.
dmi.chassis.version: Mac-7BA5B2DFE22DDD8C
dmi.modalias: dmi:bvnAppleInc.:bvr1731.120.10.0.0(iBridge19.16.15071.0.0,0):bd04/24/2022:br0.1:svnAppleInc.:pnMacmini8,1:pvr1.0:rvnAppleInc.:rnMac-7BA5B2DFE22DDD8C:rvrMacmini8,1:cvnAppleInc.:ct9:cvrMac-7BA5B2DFE22DDD8C:sku:
dmi.product.family: Mac mini
dmi.product.name: Macmini8,1
dmi.product.version: 1.0
dmi.sys.vendor: Apple Inc.

Related branches

Revision history for this message
Harald Rudell (harald-rudell) wrote :
Revision history for this message
Harald Rudell (harald-rudell) wrote :

/etc/nfs.conf:
[general]
pipefs-directory = /run/rpc_pipefs
[mountd]
manage-gids = y
[nfsd]
host = 1.2.3.4
threads = 8
udp = 0
vers2 = n
vers3 = n
vers4 = y

Revision history for this message
Harald Rudell (harald-rudell) wrote :

previous files used:

/etc/default/nfs-common:
NEED_STATD=no

/etc/default/nfs-kernel-server:
RPCNFSDCOUNT=8
RPCNFSDPRIORITY=0
RPCMOUNTDOPTS="--manage-gids --no-nfs-version 2 --no-nfs-version 3 --no-udp"
NEED_SVCGSSD="yes"
RPCSVCGSSDOPTS=""
RPCNFSDOPTS="--no-nfs-version 3 --host 1.2.3.4"

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Harald Rudell (harald-rudell) wrote : Re: /etc.nfs.conf fails for nfsv4 server

This is not related to /etc/nfs/conf

problem 1: nfs-blkmap.service
journalctl --catalog --pager-end --unit=nfs-blkmap.service
working system:
Jun 25 07:34:59 c87z blkmapd[2671]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
Jun 25 07:34:59 c87z systemd[1]: Starting pNFS block layout mapping daemon...
failing system:
Jun 25 09:09:26 c68z blkmapd[2968]: open pipe file /var/lib/nfs/rpc_pipefs/nfs/blocklayout failed: No such file or directory
Jun 25 09:09:26 c68z systemd[1]: nfs-blkmap.service: New main PID 2968 does not exist or is a zombie.
Jun 25 09:09:26 c68z systemd[1]: nfs-blkmap.service: Failed with result 'protocol'.

working system:
blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
^C
failing system:
blkmapd -f
blkmapd: open pipe file /var/lib/nfs/rpc_pipefs/nfs/blocklayout failed: No such file or directory
double free or corruption (out)
Aborted (core dumped)

Revision history for this message
Harald Rudell (harald-rudell) wrote :

for rpc-svcgssd.service:
systemctl restart nfs-server.service
working system:
rpc-svcgssd.service is stopped then started
failing system:
rpc-svcgssd.service is stopped and not started

Revision history for this message
Harald Rudell (harald-rudell) wrote :

working system:
file $(which blkmapd)
/usr/sbin/blkmapd: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=3bb4e4a72904f10a881f67880b26998c39869ce3, for GNU/Linux 3.2.0, stripped
failing system:
file $(which blkmapd)
/usr/sbin/blkmapd: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=3bb4e4a72904f10a881f67880b26998c39869ce3, for GNU/Linux 3.2.0, stripped

— the same

Revision history for this message
Harald Rudell (harald-rudell) wrote :

after reinstall of nfs-common and nfs-kernel-server and change back to /etc/nfs.conf
— use by clients work again

rpc-svcgssd.service is started on nfs-server restart

blkmapd errror is still present

Revision history for this message
Harald Rudell (harald-rudell) wrote :

To conclude: the remaining issue that is not service affecting:

on boot:
× nfs-blkmap.service - pNFS block layout mapping daemon
     Loaded: loaded (/lib/systemd/system/nfs-blkmap.service; enabled; vendor preset: enabled)
     Active: failed (Result: core-dump) since Sat 2022-06-25 07:14:34 PDT; 27min ago

blkmapd -f
blkmapd: open pipe file /var/lib/nfs/rpc_pipefs/nfs/blocklayout failed: No such file or directory
double free or corruption (out)
Aborted (core dumped)

works on one system, not on a second

Revision history for this message
Steve Dodd (anarchetic) wrote :

The blkmapd core dump might be down to https://www.spinics.net/lists/linux-nfs/msg87598.html ? I'm seeing it as well, for some reason I can't get apport to upload the crash file :(

summary: - /etc.nfs.conf fails for nfsv4 server
+ /etc.nfs.conf fails for nfsv4 server / blkmapd dumps core
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

> blkmapd: open pipe file /var/lib/nfs/rpc_pipefs/nfs/blocklayout failed: No such file or
> directory

Do you have rpc_pipefs mounted elsewhere in that moment? Perhaps /run?

Check "mount | grep rpc_pipefs"

I'm tracking https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1971935 which is about a discrepancy in the mountpoint of the rpc_pipefs filesystem, but couldn't reproduce that one.

As for nfs.conf, it's best to check "nfsconf --dump" instead of just one config file, as all of /etc/nfs.conf and /etc/nfs.conf.d/*.conf are merged together. Just to be sure you don't have some /etc/nfs.conf.d/*.conf changing some of the parameters.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

> working system:
> Jun 25 07:34:59 c87z blkmapd[2671]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No
> such file or directory

> failing system:
> Jun 25 09:09:26 c68z blkmapd[2968]: open pipe file /var/lib/nfs/rpc_pipefs/nfs/blocklayout
> failed: No such file or directory

Yeah, for some reason the failing system is looking for the rpc_pipefs filesystem in a different mountpoint.

Double check that config in the nfsconf --dump output in both cases.

Changed in nfs-utils (Ubuntu):
status: New → Incomplete
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

So far it doesn't look like a kernel issue, so I'm removing the "linux" task from this bug.

no longer affects: linux (Ubuntu)
Revision history for this message
Harald Rudell (harald-rudell) wrote :

After a nfs-server restart, rpc-svcgssd.service - RPC security service for NFS server isn’t started again

so, one get-around is to reinstall 22.04 nfs after each restart

or, the service can be run separately using:
/usr/sbin/rpc.svcgssd -f

When this service is not present for nfs4 krb5p, which it isn’t in 22.04, then
mac OS produces error 61, because it goes looking for nfs3 and other old garbage
or it can say:
error 1
ls: fts_read: Permission denied
…macOS has lots of features like that…

A Linux client will say:
mount.nfs: mount(2): Permission denied

There is no logging or anything so just like Bud Fox you must “just know”

Revision history for this message
Harald Rudell (harald-rudell) wrote :

@Andreas Hasenack (ahasenack):
the rpc_pipefs error is always present, nfs runs anyway

Why the servcie failed here is:
blkmapd -f
double free or corruption (out)
Aborted (core dumped)

That’s a bug

Changed in nfs-utils (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Harald Rudell (harald-rudell) wrote :

On an apparently working 22.04 nfs4 krb5p server, the blocklayout pipe is not used

ps -fCblkmapd
UID PID PPID C STIME TTY TIME CMD
root 2634 1 0 Jul01 ? 00:00:00 /usr/sbin/blkmapd
lsof -p2634
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
blkmapd 2634 root cwd DIR 0,28 29 34 /
blkmapd 2634 root rtd DIR 0,28 29 34 /
blkmapd 2634 root txt REG 0,28 39224 642976 /usr/sbin/blkmapd
blkmapd 2634 root mem REG 0,28 613064 233776 /usr/lib/x86_64-linux-gnu/libpcre2-8.so.0.10.4
blkmapd 2634 root mem REG 0,28 940560 218864 /usr/lib/x86_64-linux-gnu/libm.so.6
blkmapd 2634 root mem REG 0,28 166240 140762 /usr/lib/x86_64-linux-gnu/libudev.so.1.7.2
blkmapd 2634 root mem REG 0,28 166280 233490 /usr/lib/x86_64-linux-gnu/libselinux.so.1
blkmapd 2634 root mem REG 0,28 2216304 218858 /usr/lib/x86_64-linux-gnu/libc.so.6
blkmapd 2634 root mem REG 0,28 438864 637950 /usr/lib/x86_64-linux-gnu/libdevmapper.so.1.02.1
blkmapd 2634 root mem REG 0,28 240936 218852 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
blkmapd 2634 root 0u CHR 1,3 0t0 5 /dev/null
blkmapd 2634 root 1u CHR 1,3 0t0 5 /dev/null
blkmapd 2634 root 2u CHR 1,3 0t0 5 /dev/null
blkmapd 2634 root 3wW REG 0,25 5 1889 /run/blkmapd.pid
blkmapd 2634 root 4r a_inode 0,14 0 12560 inotify
blkmapd 2634 root 5u unix 0xffff9fe8c7c89100 0t0 26241 type=DGRAM

Revision history for this message
Harald Rudell (harald-rudell) wrote :

nfsconf --dump # working system, /etc/default files
[general]
 pipefs-directory = /run/rpc_pipefs

[mountd]
 manage-gids = 1

[nfsd]
 threads = 8
 udp = 0
 vers2 = n
 vers3 = n
 vers4 = y

# failing system /etc/nfs.conf
[general]
 pipefs-directory = /run/rpc_pipefs

[mountd]
 manage-gids = y

[nfsd]
 host = 192.168.1.2
 threads = 8
 udp = 0
 vers2 = n
 vers3 = n
 vers4 = y

#2 has manage-gids = y instead of 1 and host = 192.168.1.2 for specific interface

Revision history for this message
Harald Rudell (harald-rudell) wrote :

Another encountered bug is that for nfs4 krb5p only there should be no network-facing portmap. 22.04 still has it on all interfaces port 111. It should only be on local interface

Revision history for this message
Harald Rudell (harald-rudell) wrote :

An easier fix for the bug that systemctl restart nfs-server.service sometimes shuts down instead of restarts rpc-svcgssd.service is to start the second service manually:

systemctl start rpc-svcgssd.service

without this service running, nfs4-krb5p mount requests from clients will fail with mysterious errors without any fault indication on the server

Revision history for this message
Harald Rudell (harald-rudell) wrote :

The check is:
ps -fCrpc.svcgssd
UID PID PPID C STIME TTY TIME CMD
root 207469 1 0 07:20 ? 00:00:00 /usr/sbin/rpc.svcgssd

Revision history for this message
Harald Rudell (harald-rudell) wrote :

To get portmap 111 off the network one has to edit:

nano /usr/lib/systemd/system/rpcbind.socket

ListenStream=127.0.0.1:111
ListenDatagram=127.0.0.1:111
ListenStream=[::1]:111
ListenDatagram=[::1]:111

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Harald, you reported many different things, and I'm a bit lost here. Let's pick one issue.

> After a nfs-server restart, rpc-svcgssd.service - RPC security service for NFS server isn’t
> started again

Why isn't it running? What does status say?

About service restarts, with so many individual pieces it can get complicated how to properly restart the service. I wrote this table in the server guide [1], in the "Restarting NFS services" section.

> @Andreas Hasenack (ahasenack):
> the rpc_pipefs error is always present, nfs runs anyway

What do you mean? The rpc_pipefs is an absolute requirement, and there are two mount units for it: var-lib-nfs-rpc_pipefs.mount and the generated one which mounts it in /run when the nfs.conf has a path different than the built-in default. One of the two mount units must have been ran.

> nfsconf --dump # working system,

and

> # failing system /etc/nfs.conf

you used nfsconf --dump in one case, and straight /etc/nfs.conf in the other, it's not clear if the failing system has some override in a /etc/nfs.conf.d/*.conf snippet.

For blkmap, a core dump is definitely a bug, it should error clearly when it doesn't find the rpc_pipefs mountpoint instead of crashing, but fixing that looks like will just switch the error from a core dump to a "sorry, couldn't find rpc_pipefs", and it still won't run.

1. https://ubuntu.com/server/docs/service-nfs

Revision history for this message
Harald Rudell (harald-rudell) wrote :

1. rpc-svcgssd.service not restarted with nfs-server
the status was provided in initial bug description:
○ rpc-svcgssd.service - RPC security service for NFS server
     Loaded: loaded (/lib/systemd/system/rpc-svcgssd.service; static)
     Active: inactive (dead) since Fri 2022-06-24 19:07:31 PDT; 12h ago
1 get-around:
do systemctl start rpc-svcgssd.service after restarting nfs-server

2. as proven by the lsof entry above, rpc_pipefs is not open by the blkmapd process, your statement is incorrect
2 get-around: nfs can run without blkmapd

3. in the comparative config printout, nfsconf --dump was used for both servers, I think.
3 observation: the logic around the config is buggy as of 22.04. The config lack a few features as described in this bug
— bug: when to start rpc-svcgssd
- lack: portmap on localhost only
- lack: preventing statd, should be automatic
- lack: it is unclear to me if manage-gids controls the nfsd kernel module, too
For example, there could be a minimum security setting with value krb5p if that seems to be a good thing

4. nfs4-krb5p can run without blkmapd. And it does because blkmapd crashes as described above

The goal of nfs should be to easily disable everything that is not the latest version, ie 4.2. Those people that have legacy hardware and software should then have the settings available to them to support those historic features

nfs4 has a better security model and separates authentication from local operating systems. Therefore an interest in latest-version-only, maximum hardened, highest-security (krb5p) nfs.conf. Besides the root mount, there will then be shares that are either rw or ro

Revision history for this message
Harald Rudell (harald-rudell) wrote :

I also want to point out that those bind mounts favored by nfs are readable by anyone, eg. user nobody. Even when they are deep-links into some file system supposedly protected by unix permissions

Each bind mount target has to have its permissions set like it’s 1970

chown x.y /mnt/somefs/my-deep-mount-target
chmod … /mnt/somefs/my-deep-mount-target

this can be tested:
/etc.fstab:
/mnt/somefs/my-deep-mount-target /srv/nfs/someshare none bind,defaults,nofail,x-systemd.requires=zfs-mount.service 0 0

mount /srv/nfs/someshare
sudo --user nobody ls /srv/nfs/someshare

tags: added: server-todo
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

@anarchetic
> The blkmapd core dump might be down to https://www.spinics.net/lists/linux-nfs/msg87598.html ?
> I'm seeing it as well, for some reason I can't get apport to upload the crash file :(

I'm unable to reproduce the problem on jammy, blkmapd does complain about the missing blocklayout file, but stays running:

root@j-nfs-server:~# mount -t rpc_pipefs
sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)

root@j-nfs-server:~# blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory

^Cblkmapd: exit on signal(2) <== me pressing ctrl-c

And I tried multiple combinations of that mount point being there, elsewhere, or not mounted at all.

If you still get the error, or even a crash file in /var/crash, please upload it to this ticket.

If there is no core dump or crash file, then perhaps at least try to get an strace output:

strace -f -o blkmapd.strace -s 500 blkmapd -f

And attach (or inspect) blkmapd.strace

In comment #9 we can see it was using the "wrong" rpc_pipefs mount point:
 blkmapd: open pipe file /var/lib/nfs/rpc_pipefs/nfs/blocklayout failed: No such file or directory

So I still think this might be related to bug #1971935 where if you install nfs-common in combination with something else using rpc_pipefs (like autofs, in that particular case), you would get rpc_pipefs mounted in /var/lib/nfs/rpc_pipefs whereas all daemons will be expecting it to be in /run/rpc_pipefs. But again, even in that case, I couldn't reproduce the blkmapd crash.

Revision history for this message
Jens Maus (jens.maus) wrote :

Since the upgrade from 20.04 to 22.04 I am seeing the same issue (blkmapd crashing) here:

-- cut here --
~# blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
double free or corruption (out)
Aborted (core dumped)
-- cut here --

The 1. Issue /run/rpc_pipefs/nfs/blocklayout failing I solved via https://help.univention.com/t/nfs-blkmap-service-open-pipe-file-run-rpc-pipefs-nfs-blocklayout-failed-no-such-file-or-directory/19351

The 2. Issue (blkmapd crashing) I could not solve. Please find attached the request strace and crash dump file.

Revision history for this message
Jens Maus (jens.maus) wrote :

here the blkmapd crash file.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Thanks for these files. I'm still having trouble reproducing the crash, but looking at logs and strace, it looks like it could be related to device mapper devices, like LVM.

I found https://bitcoden.com/answers/how-do-i-resolve-an-error-with-the-pnfs-mapping-daemon where someone said the problem started happening after a logical volume was created, but still couldn't reproduce it.

Do you have some particular LVM setup, or something out of the ordinary in that area? I tried creating a simple VG, then two LVs, even rebooted in between, but blkmapd does not crash.

There is also https://www.spinics.net/lists/linux-nfs/msg87598.html that someone else pointed out, but it's unacknowledged in the mailing list, and I don't see similar commits in the upstream code yet.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Ok, I reproduced the problem, and I know what the fix is. Stay tuned :)

Changed in nfs-utils (Ubuntu):
assignee: nobody → Andreas Hasenack (ahasenack)
status: Confirmed → In Progress
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
Changed in nfs-utils (Debian):
status: Unknown → Confirmed
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Waiting for the next ubuntu release to open up for development.

Changed in nfs-utils (Ubuntu Jammy):
status: New → In Progress
assignee: nobody → Andreas Hasenack (ahasenack)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nfs-utils - 1:2.6.1-2ubuntu5

---------------
nfs-utils (1:2.6.1-2ubuntu5) lunar; urgency=medium

  * d/p/blkmapd-fix-invalid-free.patch: fix blkmapd crash due to invalid
    free() (LP: #1979885)

 -- Andreas Hasenack <email address hidden> Fri, 28 Oct 2022 08:26:52 -0300

Changed in nfs-utils (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Harald, or anyone else affected,

Accepted nfs-utils into kinetic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nfs-utils/1:2.6.1-2ubuntu4.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-kinetic to verification-done-kinetic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-kinetic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in nfs-utils (Ubuntu Kinetic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-kinetic
Changed in nfs-utils (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed-jammy
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Hello Harald, or anyone else affected,

Accepted nfs-utils into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nfs-utils/1:2.6.1-1ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Kinetic verification

Confirming the bug:
root@k-nfs-blkmapd-crash:~# apt-cache policy nfs-kernel-server
nfs-kernel-server:
  Installed: 1:2.6.1-2ubuntu4
  Candidate: 1:2.6.1-2ubuntu4
  Version table:
 *** 1:2.6.1-2ubuntu4 500
        500 http://br.archive.ubuntu.com/ubuntu kinetic/main amd64 Packages
        100 /var/lib/dpkg/status

blkmapd has crashed:
root@k-nfs-blkmapd-crash:~# systemctl status nfs-blkmap.service
× nfs-blkmap.service - pNFS block layout mapping daemon
     Loaded: loaded (/lib/systemd/system/nfs-blkmap.service; enabled; preset: enabled)
     Active: failed (Result: core-dump) since Wed 2022-11-16 19:05:41 UTC; 21s ago
   Duration: 233ms
   Main PID: 1590 (code=dumped, signal=ABRT)
        CPU: 6ms

Nov 16 19:05:41 k-nfs-blkmapd-crash systemd[1]: Starting pNFS block layout mapping daemon...
Nov 16 19:05:41 k-nfs-blkmapd-crash systemd[1]: nfs-blkmap.service: Can't open PID file /run/blkmapd.pid (yet?) after start: Operation not permitted
Nov 16 19:05:41 k-nfs-blkmapd-crash blkmapd[1590]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
Nov 16 19:05:41 k-nfs-blkmapd-crash systemd[1]: Started pNFS block layout mapping daemon.
Nov 16 19:05:41 k-nfs-blkmapd-crash systemd[1]: nfs-blkmap.service: Main process exited, code=dumped, status=6/ABRT
Nov 16 19:05:41 k-nfs-blkmapd-crash systemd[1]: nfs-blkmap.service: Failed with result 'core-dump'.

root@k-nfs-blkmapd-crash:~# blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
double free or corruption (out)
Aborted (core dumped)

Updating the package:
root@k-nfs-blkmapd-crash:~# apt-cache policy nfs-kernel-server
nfs-kernel-server:
  Installed: 1:2.6.1-2ubuntu4.1
  Candidate: 1:2.6.1-2ubuntu4.1
  Version table:
 *** 1:2.6.1-2ubuntu4.1 500
        500 http://br.archive.ubuntu.com/ubuntu kinetic-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1:2.6.1-2ubuntu4 500
        500 http://br.archive.ubuntu.com/ubuntu kinetic/main amd64 Packages

Right after the install, the blkmapd service is already running:

root@k-nfs-blkmapd-crash:~# systemctl status nfs-blkmap.service
● nfs-blkmap.service - pNFS block layout mapping daemon
     Loaded: loaded (/lib/systemd/system/nfs-blkmap.service; enabled; preset: enabled)
     Active: active (running) since Wed 2022-11-16 19:08:03 UTC; 1min 45s ago
   Main PID: 2736 (blkmapd)
      Tasks: 1 (limit: 1075)
     Memory: 304.0K
        CPU: 2ms
     CGroup: /system.slice/nfs-blkmap.service
             └─2736 /usr/sbin/blkmapd

Nov 16 19:08:03 k-nfs-blkmapd-crash systemd[1]: Starting pNFS block layout mapping daemon...
Nov 16 19:08:03 k-nfs-blkmapd-crash systemd[1]: Started pNFS block layout mapping daemon.

Stopping it so we can try an interactive start:
root@k-nfs-blkmapd-crash:~# systemctl stop nfs-blkmap.service
root@k-nfs-blkmapd-crash:~# blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory

It doesn't crash.

Kinetic verification succeeded.

tags: added: verification-done-kinetic
removed: verification-needed-kinetic
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Jammy verification

Reproducing the bug:

root@j-nfs-blkmapd-crash:~# apt-cache policy nfs-kernel-server
nfs-kernel-server:
  Installed: 1:2.6.1-1ubuntu1.1
  Candidate: 1:2.6.1-1ubuntu1.1
  Version table:
 *** 1:2.6.1-1ubuntu1.1 500
        500 http://br.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages
        100 /var/lib/dpkg/status

Service has crashed already:
root@j-nfs-blkmapd-crash:~# systemctl status nfs-blkmap.service
× nfs-blkmap.service - pNFS block layout mapping daemon
     Loaded: loaded (/lib/systemd/system/nfs-blkmap.service; enabled; vendor preset: enabled)
     Active: failed (Result: core-dump) since Wed 2022-11-16 19:20:14 UTC; 23s ago
   Main PID: 1778 (code=dumped, signal=ABRT)
        CPU: 3ms

Nov 16 19:20:13 j-nfs-blkmapd-crash systemd[1]: Starting pNFS block layout mapping daemon...
Nov 16 19:20:14 j-nfs-blkmapd-crash systemd[1]: Started pNFS block layout mapping daemon.
Nov 16 19:20:14 j-nfs-blkmapd-crash systemd[1]: nfs-blkmap.service: Main process exited, code=dumped, status=6/ABRT
Nov 16 19:20:14 j-nfs-blkmapd-crash systemd[1]: nfs-blkmap.service: Failed with result 'core-dump'.

Updating to the package in proposed:
root@j-nfs-blkmapd-crash:~# apt-cache policy nfs-kernel-server
nfs-kernel-server:
  Installed: 1:2.6.1-1ubuntu1.2
  Candidate: 1:2.6.1-1ubuntu1.2
  Version table:
 *** 1:2.6.1-1ubuntu1.2 500
        500 http://br.archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages
        100 /var/lib/dpkg/status

Service is running already:
root@j-nfs-blkmapd-crash:~# systemctl status nfs-blkmap.service
● nfs-blkmap.service - pNFS block layout mapping daemon
     Loaded: loaded (/lib/systemd/system/nfs-blkmap.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2022-11-16 19:21:31 UTC; 22s ago
   Main PID: 2960 (blkmapd)
      Tasks: 1 (limit: 1082)
     Memory: 304.0K
        CPU: 2ms
     CGroup: /system.slice/nfs-blkmap.service
             └─2960 /usr/sbin/blkmapd

Nov 16 19:21:31 j-nfs-blkmapd-crash systemd[1]: Starting pNFS block layout mapping daemon...
Nov 16 19:21:31 j-nfs-blkmapd-crash systemd[1]: Started pNFS block layout mapping daemon.

Stopping and starting without forking to verify again:
root@j-nfs-blkmapd-crash:~# systemctl stop nfs-blkmap.service
root@j-nfs-blkmapd-crash:~# blkmapd -f
blkmapd: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory

Service runs without crashing.

Jammy verification succeeded.

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nfs-utils - 1:2.6.1-2ubuntu4.1

---------------
nfs-utils (1:2.6.1-2ubuntu4.1) kinetic; urgency=medium

  * d/p/blkmapd-fix-invalid-free.patch: fix blkmapd crash due to invalid
    free() (LP: #1979885)

 -- Andreas Hasenack <email address hidden> Thu, 20 Oct 2022 11:45:11 -0300

Changed in nfs-utils (Ubuntu Kinetic):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for nfs-utils has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nfs-utils - 1:2.6.1-1ubuntu1.2

---------------
nfs-utils (1:2.6.1-1ubuntu1.2) jammy; urgency=medium

  * d/p/blkmapd-fix-invalid-free.patch: fix blkmapd crash due to invalid
    free() (LP: #1979885)

 -- Andreas Hasenack <email address hidden> Thu, 20 Oct 2022 11:50:13 -0300

Changed in nfs-utils (Ubuntu Jammy):
status: Fix Committed → Fix Released
Changed in nfs-utils (Debian):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.