pod was stuck in Init:0/1 state due to failure to mount volume with error 'unable to get monitor info from DNS SRV with service name: ceph-mon'.

Bug #2047571 reported by Erickson Silva de Oliveira
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Erickson Silva de Oliveira

Bug Description

Brief Description
-----------------
A pod was stuck in Init:0/1 state due to failure to mount volume with error 'unable to get monitor info from DNS SRV with service name: ceph-mon'.

Severity
--------
Severe

---------------
<Reproducible/Intermittent/Seen once>
It occurred once.

System Configuration
--------------------
AIO Duplex

Timestamp/Logs
--------------
Warning FailedMount 123m kubelet MountVolume.MountDevice failed for volume "pvc-76d8f846-8790-4d48-8973-6c7ffc21993e" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 192.168.204.1:6789:/volumes/csi/pvc-volumes-1a772248-9016-11ee-a7ad-6e46ebe3f44d/9be71491-8231-4081-8b98-f29541173beb /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/467e93df99c2e1d552554cfb8dc25a4d1c9a0a0edcd9440d153681548111966d/globalmount -o name=admin,secretfile=/tmp/csi/keys/keyfile-1914857982,mds_namespace=kube-cephfs,debug,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon
2023-12-18T02:35:25.660+0000 7fa5620c90c0 -1 failed for service _ceph-mon._tcp
mount error 22 = Invalid argument
Warning FailedMount 123m kubelet Unable to attach or mount volumes: unmounted volumes=[apdata-data], unattached volumes=[kube-api-access-snmsm li-data shared-data pstack-data hugepage dpme iprs-dir mirr-storage apdata-data shared iprs]: timed out waiting for the condition
Warning FailedMount 4m51s (x114 over 122m) kubelet (combined from similar events): MountVolume.MountDevice failed for volume "pvc-76d8f846-8790-4d48-8973-6c7ffc21993e" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 192.168.204.1:6789:/volumes/csi/pvc-volumes-1a772248-9016-11ee-a7ad-6e46ebe3f44d/9be71491-8231-4081-8b98-f29541173beb /var/lib/kubelet/plugins/kubernetes.io/csi/cephfs.csi.ceph.com/467e93df99c2e1d552554cfb8dc25a4d1c9a0a0edcd9440d153681548111966d/globalmount -o name=admin,secretfile=/tmp/csi/keys/keyfile-1627258384,mds_namespace=kube-cephfs,debug,_netdev] stderr: unable to get monitor info from DNS SRV with service name: ceph-mon
2023-12-18T04:34:31.685+0000 7ffff7fe90c0 -1 failed for service _ceph-mon._tcp
mount error 22 = Invalid argumen

Workaround
----------
$ kubectl patch pv $(kubectl get pv | awk '{if ($7 == "cephfs") print $1;}') --type=json -p="[{'op': 'remove', 'path': '/spec/mountOptions'}]"

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to platform-armada-app (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to platform-armada-app (master)

Reviewed: https://review.opendev.org/c/starlingx/platform-armada-app/+/904383
Committed: https://opendev.org/starlingx/platform-armada-app/commit/9195bfb88719723f91488ebc3248723ca38d4788
Submitter: "Zuul (22348)"
Branch: master

commit 9195bfb88719723f91488ebc3248723ca38d4788
Author: Erickson Silva de Oliveira <email address hidden>
Date: Wed Dec 27 11:14:28 2023 -0300

    Remove debug option from cephfs PVs

    Since version v3.9.0 of ceph-csi, the "debug" option has been
    removed from mountOptions of the cephfs storage class, however,
    this option still exists on cephfs PVs created with the previous
    version of ceph-csi, causing the pod to fail.

    To resolve this, a check for existing cephfs PVs has been added
    to the cephfs storage-init script to remove this parameter if
    it exists.

    Test Plan:
      PASS: Create a PVC and pod on AIO-SX with ceph-csi v3.6.2
      PASS: Build platform-integ-apps with changes (ceph-csi v3.9.0)
      PASS: Check that "mountOption: -debug" is not present in the
            cephfs storage class and pv.

    Closes-Bug: 2047571

    Change-Id: Id7c8f77d2bc0b4e4afc67966810d5d3c40fc1e06
    Signed-off-by: Erickson Silva de Oliveira <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.apps stx.storage
Changed in starlingx:
assignee: nobody → Erickson Silva de Oliveira (esilvade)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.