Compute stuck in offline state due to NFS mount failure

Bug #1942383 reported by Jiping Ma
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Jiping Ma

Bug Description

Brief Description
-----------------
Computes stuck at offline status in Kubernetes:

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl get hosts -n=deployment -o=wide
NAME ADMINISTRATIVE OPERATIONAL AVAILABILITY PROFILE INSYNC RECONCILED
compute-0 locked disabled offline compute-0-profile false false
compute-1 locked disabled offline compute-0-profile false false
compute-2 locked disabled offline compute-0-profile false false
controller-0 unlocked enabled available controller-0-profile false false
controller-1 unlocked enabled degraded controller-0-profile false false

[sysadmin@controller-0 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | compute-0 | worker | locked | disabled | online |
| 3 | compute-1 | worker | locked | disabled | online |
| 4 | compute-2 | worker | locked | disabled | online |
| 5 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+

[sysadmin@controller-0 ~(keystone_admin)]$ system host-unlock compute-0
Can not unlock host compute-0 undergoing reinstall. Please ensure host has completed reinstall prior to unlock.

Severity
--------
High - Cannot successfully bring up a multi-node system

Steps to Reproduce
------------------
Install / Configure a multi-node system w/ a recent stx master load

Expected Behavior
------------------
System comes up successfully

Actual Behavior
----------------
Only the controllers are online. The computes are stuck in an offline state.

Reproducibility
---------------
100%

System Configuration
--------------------
multi-node system

Branch/Pull Time/Commit
-----------------------
stx master Aug 25

Last Pass
---------
Never w/ the 5.10 kernel

Timestamp/Logs
--------------
compute-0:/usr/bin# sudo mount -v -t nfs -o timeo=30,proto=udp6,vers=3,rsize=1024,wsize=1024 controller:/opt/platform/sysinv/21.12 /mnt/sysinv
mount.nfs: timeout set for Fri Aug 27 17:19:22 2021
mount.nfs: trying text-based options 'timeo=30,proto=udp6,vers=3,rsize=1024,wsize=1024,addr=face::1'
mount.nfs: prog 100003, trying vers=3, prot=17
mount.nfs: trying face::1 prog 100003 vers 3 prot UDP port 2049
mount.nfs: prog 100005, trying vers=3, prot=17
mount.nfs: trying face::1 prog 100005 vers 3 prot UDP port 20048
mount.nfs: mount(2): Invalid argument
mount.nfs: an incorrect mount option was specified

Test Activity
-------------
Sanity

Workaround
----------
None

Jiping Ma (jma11)
Changed in starlingx:
assignee: nobody → Jiping Ma (jma11)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kernel (master)

Reviewed: https://review.opendev.org/c/starlingx/kernel/+/807000
Committed: https://opendev.org/starlingx/kernel/commit/8ed87c337979cf878d83b32b02a6bd290441eacd
Submitter: "Zuul (22348)"
Branch: master

commit 8ed87c337979cf878d83b32b02a6bd290441eacd
Author: Jiping Ma <email address hidden>
Date: Wed Sep 1 20:54:11 2021 -0400

    Enable NFS udp support

    mount nfs will fail if CONFIG_NFS_DISABLE_UDP_SUPPORT=y. We need
    disable it to support nfs mount.

    Closes-Bug: #1942383

    Signed-off-by: Jiping Ma <email address hidden>
    Change-Id: I11a3c12c27fb7ad8ad934964e85f1a20167057b2

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote : Re: NFS mount failed

screening: stx.6.0 / high - this issue prevents the installation of multi-node systems. The issue is related to the recent migration to the 5.10 kernel

tags: added: stx.distro.other
tags: added: stx.6.0
Changed in starlingx:
importance: Undecided → High
summary: - NFS mount failed
+ Compute stuck in offline state due to NFS mount failure
Ghada Khalil (gkhalil)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.