NFSv4 fails to mount in noble/s390x

Bug #2060217 reported by Andreas Hasenack
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
New
Undecided
Unassigned
Ubuntu on IBM z Systems
New
Undecided
bugproxy
linux (Ubuntu)
Triaged
Undecided
Unassigned
nfs-utils (Ubuntu)
Invalid
High
Unassigned

Bug Description

https://autopkgtest.ubuntu.com/packages/n/nfs-utils/noble/s390x

Looks like it has been failing for a long time already.

Log: https://autopkgtest.ubuntu.com/results/autopkgtest-noble/noble/s390x/n/nfs-utils/20240404_145924_ef255@/log.gz

339s autopkgtest [14:41:04]: test local-server-client: [-----------------------
340s Killed
340s autopkgtest [14:41:05]: test process requested reboot with marker boot1
364s autopkgtest-virt-ssh: WARNING: ssh connection failed. Retrying in 3 seconds...
372s FAIL: nfs_home not mounted
373s autopkgtest [14:41:38]: test local-server-client: -----------------------]
373s local-server-client FAIL non-zero exit status 1

and

934s autopkgtest [14:50:59]: test kerberos-mount: [-----------------------
935s Initializing database '/var/lib/krb5kdc/principal' for realm 'DEP8',
935s master key name 'K/M@DEP8'
935s Authenticating as principal root/admin@DEP8 with password.
935s Principal "nfs/nfs-server.dep8@DEP8" created.
935s Authenticating as principal root/admin@DEP8 with password.
935s Entry for principal nfs/nfs-server.dep8 with kvno 2, encryption type aes256-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab.
935s Entry for principal nfs/nfs-server.dep8 with kvno 2, encryption type aes128-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab.
935s Authenticating as principal root/admin@DEP8 with password.
935s Principal "host/nfs-server.dep8@DEP8" created.
935s Authenticating as principal root/admin@DEP8 with password.
935s Entry for principal host/nfs-server.dep8 with kvno 2, encryption type aes256-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab.
935s Entry for principal host/nfs-server.dep8 with kvno 2, encryption type aes128-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab.
936s exporting *:/storage
938s mount.nfs: mount system call failed for /mnt
938s umount: /mnt: not mounted.
938s autopkgtest [14:51:02]: test kerberos-mount: -----------------------]
939s kerberos-mount FAIL non-zero exit status 32

Changed in nfs-utils (Ubuntu):
status: New → Triaged
importance: Undecided → High
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

The tests are correct, I can reproduce it in s390x. Any v4.* mount fails, but v3 works:

root@nfs:~# mount localhost:/home /mnt/nfs_home -o vers=4.2
mount.nfs: mount system call failed for /mnt/nfs_home

root@nfs:~# mount localhost:/home /mnt/nfs_home -o vers=4.1
mount.nfs: mount system call failed for /mnt/nfs_home

root@nfs:~# mount localhost:/home /mnt/nfs_home -o vers=4.0
mount.nfs: mount system call failed for /mnt/nfs_home

root@nfs:~# mount localhost:/home /mnt/nfs_home -o vers=4
mount.nfs: mount system call failed for /mnt/nfs_home

root@nfs:~# mount localhost:/home /mnt/nfs_home -o vers=3
root@nfs:~#

And no version at all also fails:
root@nfs:~# mount localhost:/home /mnt/nfs_home
mount.nfs: mount system call failed for /mnt/nfs_home

dmesg is clean.

Strace shows, for that last attempt with no version specified:
1998 mount("localhost:/home", "/mnt/nfs_home", "nfs", 0, "vers=4.2,addr=127.0.0.1,clientad"...) = -1 EIO (Input/output error)

summary: - DEP8 failures in noble/s390x
+ NFSv4 fails to mount in noble/s390x
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Verbose mount:

# mount localhost:/home /mnt/nfs_home -v
mount.nfs: timeout set for Sun Apr 7 19:09:07 2024
mount.nfs: trying text-based options 'vers=4.2,addr=127.0.0.1,clientaddr=127.0.0.1'
mount.nfs: mount(2): Input/output error
mount.nfs: mount system call failed for /mnt/nfs_home

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I enabled rpc debugging with rpcdebug (after fixing an off-by-one strcpy error), but the logs don't mean much to me.

Revision history for this message
Bryce Harrington (bryce) wrote :

Does running `mount -vvv ...` provide more information?

I notice of the 3 test cases in the dep8 that it's the 2 NFSv4 w/ kerberos ones failing, and the simple nfsv3 one passes. Could it be something to do with kerberos or its configuration? Like, could it be authorizing a network IP other than 127.0.0.1?

Revision history for this message
Andreas Hasenack (ahasenack) wrote : Re: [Bug 2060217] Re: NFSv4 fails to mount in noble/s390x
Download full text (3.7 KiB)

I don't think it's Kerberos related. The manual reproducer is a plain mount
localhost, without Kerberos.

The same simple localhost setup was working in mantic. I then dist upgraded
the VM to noble, and it started failing.

On Sun, 7 Apr 2024, 18:10 Bryce Harrington, <email address hidden>
wrote:

> Does running `mount -vvv ...` provide more information?
>
> I notice of the 3 test cases in the dep8 that it's the 2 NFSv4 w/
> kerberos ones failing, and the simple nfsv3 one passes. Could it be
> something to do with kerberos or its configuration? Like, could it be
> authorizing a network IP other than 127.0.0.1?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2060217
>
> Title:
> NFSv4 fails to mount in noble/s390x
>
> Status in nfs-utils package in Ubuntu:
> Triaged
>
> Bug description:
> Just filing the bug now to keep track of it. No troubleshooting done
> yet.
>
> https://autopkgtest.ubuntu.com/packages/n/nfs-utils/noble/s390x
>
> Looks like it has been failing for a long time already.
>
> Log: https://autopkgtest.ubuntu.com/results/autopkgtest-
> noble/noble/s390x/n/nfs-utils/20240404_145924_ef255@/log.gz
>
> 339s autopkgtest [14:41:04]: test local-server-client:
> [-----------------------
> 340s Killed
> 340s autopkgtest [14:41:05]: test process requested reboot with marker
> boot1
> 364s autopkgtest-virt-ssh: WARNING: ssh connection failed. Retrying in 3
> seconds...
> 372s FAIL: nfs_home not mounted
> 373s autopkgtest [14:41:38]: test local-server-client:
> -----------------------]
> 373s local-server-client FAIL non-zero exit status 1
>
>
> and
>
>
> 934s autopkgtest [14:50:59]: test kerberos-mount:
> [-----------------------
> 935s Initializing database '/var/lib/krb5kdc/principal' for realm 'DEP8',
> 935s master key name 'K/M@DEP8'
> 935s Authenticating as principal root/admin@DEP8 with password.
> 935s Principal "nfs/nfs-server.dep8@DEP8" created.
> 935s Authenticating as principal root/admin@DEP8 with password.
> 935s Entry for principal nfs/nfs-server.dep8 with kvno 2, encryption
> type aes256-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab.
> 935s Entry for principal nfs/nfs-server.dep8 with kvno 2, encryption
> type aes128-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab.
> 935s Authenticating as principal root/admin@DEP8 with password.
> 935s Principal "host/nfs-server.dep8@DEP8" created.
> 935s Authenticating as principal root/admin@DEP8 with password.
> 935s Entry for principal host/nfs-server.dep8 with kvno 2, encryption
> type aes256-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab.
> 935s Entry for principal host/nfs-server.dep8 with kvno 2, encryption
> type aes128-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab.
> 936s exporting *:/storage
> 938s mount.nfs: mount system call failed for /mnt
> 938s umount: /mnt: not mounted.
> 938s autopkgtest [14:51:02]: test kerberos-mount:
> -----------------------]
> 939s kerberos-mount FAIL non-zero exit status 32
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu...

Read more...

description: updated
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

> Does running `mount -vvv ...` provide more information?

Nope, same as single -v:

root@nfs:~# mount /mnt/nfs_home -vvv
mount.nfs: timeout set for Mon Apr 8 15:48:10 2024
mount.nfs: trying text-based options 'vers=4.2,addr=127.0.0.1,clientaddr=127.0.0.1'
mount.nfs: mount(2): Input/output error
mount.nfs: mount system call failed for /mnt/nfs_home

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I have a VM where this can be reproduced, if anyone is interested.

tags: added: update-excuse
Changed in linux (Ubuntu):
assignee: nobody → Anthony Wong (anthonywong)
Revision history for this message
GuoqingJiang (guoqingjiang) wrote :

Hi Andreas, I'd like to take a look, could you please let me know how to access your vm? Because I don't have s390 hw.

BTW, seems it is fine for both amd64 and arm64.

https://autopkgtest.ubuntu.com/packages/n/nfs-utils/noble/amd64
https://autopkgtest.ubuntu.com/packages/n/nfs-utils/noble/arm64

And from the test history of s390, nfs-utils/1:2.6.4-3ubuntu3 can pass.

https://autopkgtest.ubuntu.com/results/autopkgtest-noble/noble/s390x/n/nfs-utils/20240302_020943_bf464@/log.gz

But nfs-utils/1:2.6.4-3ubuntu4 failed for both local-server-client and kerberos-mount test.

https://autopkgtest.ubuntu.com/results/autopkgtest-noble/noble/s390x/n/nfs-utils/20240331_152208_be4f4@/log.gz

I cloned nfs-utils repo though I can't find anything strange per git log.

commit 5caa7491375e1e81012dcf1565d2e73c30f6f085 (tag: import/1%2.6.4-3ubuntu4, origin/ubuntu/noble)
Author: Steve Langasek <email address hidden>
Date: Sun Mar 31 08:10:14 2024 +0000

    1:2.6.4-3ubuntu4 (patches unapplied)

    Imported using git-ubuntu import.

commit 86e924b7a8f68924304db261455cfff593cd5516 (tag: import/1%2.6.4-3ubuntu3)
Author: Steve Langasek <email address hidden>
Date: Thu Feb 29 09:30:58 2024 +0000

    1:2.6.4-3ubuntu3 (patches unapplied)

    Imported using git-ubuntu import.

Frank Heimes (fheimes)
tags: added: s390x
Changed in ubuntu-z-systems:
assignee: nobody → bugproxy (bugproxy)
bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-206018 severity-high targetmilestone-inin---
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Perhaps try an older kernel, from when the test last passed?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I have GuoqingJiang access to the s390x vm where this bug can be observed.

Revision history for this message
GuoqingJiang (guoqingjiang) wrote :

So mount v4 returns 32 (probably means EPIPE) but v3 returns 0.

ubuntu@nfs:~$ sudo strace -e mount mount localhost:/home /mnt/nfs_home -o vers=4
mount.nfs: mount system call failed for /mnt/nfs_home
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1830, si_uid=0, si_status=32, si_utime=0, si_stime=0} ---
+++ exited with 32 +++

ubuntu@nfs:~$ sudo strace -e mount mount localhost:/home /mnt/nfs_home -o vers=3
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=1850, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
+++ exited with 0 +++

Could I know the kernel version which use for below test? Seems it was tested on 20240302.
https://autopkgtest.ubuntu.com/results/autopkgtest-noble/noble/s390x/n/nfs-utils/20240302_020943_bf464@/log.gz

And I tried with mainline/v6.8 kernel which has the same issue.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

> Could I know the kernel version which use for below test?

Search for "testbed running kernel". In the case of that test, it's:
859s autopkgtest [01:53:38]: testbed running kernel: Linux 6.6.0-14-generic #14-Ubuntu SMP Thu Nov 30 09:46:34 UTC 2023

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2024-04-10 11:16 EDT-------
bisected it to commit fce7913b13d0 ("NFSD: Use a bitmask loop to encode FATTR4 results")
Looks like an unfixed endianness issue since v6.7.

commit fce7913b13d0270bcf926f986b7ef329e2e56eec
Author: Chuck Lever <email address hidden>
Date: Mon Sep 18 10:02:12 2023 -0400

NFSD: Use a bitmask loop to encode FATTR4 results
The fattr4 encoder is now structured like the COMPOUND op encoder:
one function for each individual attribute, called by bit number.
Benefits include:
- The individual attributes are now guaranteed to be encoded in
bitmask order into the send buffer

- There can be no unwanted side effects between attribute encoders

- The code now clearly documents which attributes are /not/
implemented on this server

Reviewed-by: Jeff Layton <email address hidden>
Signed-off-by: Chuck Lever <email address hidden>

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
GuoqingJiang (guoqingjiang) wrote :

Thanks for the info, and I tested v6.6 which was fine. Will dig into about the change.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2024-04-11 05:50 EDT-------
I've sent a fix proposal to mailing lists
https://lore.kernel.org/all<email address hidden>/

Revision history for this message
GuoqingJiang (guoqingjiang) wrote :

Thanks Vasily. After it is merged by upstream maintainer, maybe you can send it to ubuntu kernel list as well, or wait until noble update to future upstream stable release since the patch has been cced to <email address hidden> and also has fixes tag.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Marking the userspace component (src:nfs-utils) task as invalid, since it's a bug in the kernel.

Changed in nfs-utils (Ubuntu):
status: Triaged → Invalid
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Adding a release notes task to put a note under "known issues".

Revision history for this message
Anthony Wong (anthonywong) wrote :

This is a valid bug but since IBM has a fix at upstream that will trickle down to our stable kernels, I am un-assigning myself.

Changed in linux (Ubuntu):
status: New → Triaged
assignee: Anthony Wong (anthonywong) → nobody
Revision history for this message
Frank Heimes (fheimes) wrote :

I've noticed that Vasily's commit has meanwhile landed in linux-next,
so I took it from there and applied it to noble master-next,
but it failed due to changed context.
Hence I did a bit of backporting work and got now it in.

I triggered appropriate kernel test builds here:
https://launchpad.net/~fheimes/+archive/ubuntu/lp2060217

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.