ceph-csi charm does not handle ceph-fs correctly: InvalidArgument desc = volume not found

Bug #2054486 reported by Bartosz Woronicz
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph CSI Charm
Fix Released
High
Kevin W Monroe

Bug Description

when cephfs is enabled
https://charmhub.io/ceph-csi/configure#cephfs-enable
The charm will create StorageClass cephfs
However when trying to spawn pvc from we will encounter the following error
  14s (x6 over 29s) Warning ProvisioningFailed PersistentVolumeClaim/cephfs2-pvc failed to provision volume with StorageClass "cephfs": rpc error: code = InvalidArgument desc = volume not found

$ kubectl get sc cephfs -o yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
 storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2024-02-15T17:54:33Z"
  labels:
 juju.io/application: ceph-csi
 juju.io/manifest: cephfs
 juju.io/manifest-version: cephfs-v3.9.0
  name: cephfs
  resourceVersion: "2587452"
  uid: 90368aef-8aa5-4c34-9086-b2033d2e3d21
parameters:
  clusterID: 9857d9aa-c5d5-11ee-8c70-bd024fd7400e
  csi.storage.k8s.io/controller-expand-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/controller-expand-secret-namespace: ceph-csi
  csi.storage.k8s.io/node-stage-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/node-stage-secret-namespace: ceph-csi
  csi.storage.k8s.io/provisioner-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/provisioner-secret-namespace: ceph-csi
  fsName: default
  pool: ceph-fs_data
provisioner: cephfs.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate

The class contains two parameters, filesystem name and pool name
and will always use these values:
fsName: default
pool: ceph-fs_data
Surprisingly the 2nd is mostly correct if you use ceph-fs as application name
as described here: https://charmhub.io/ceph-fs/configure#rbd-pool-name
Otherwise it will also be a problem.

$ sudo ceph fs ls
name: ceph-fs, metadata pool: ceph-fs_metadata, data pools: [ceph-fs_data ] │name: ceph-fs, metadata pool: ceph-fs_metadata, data pools: [ceph-fs_data ]

After doing in-place replacement of the ceph-fs sc with correct fsName I was able to create pvc
$ kubectl get pvc -n whatever-test
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
cephfs-pvc Bound pvc-d2e0bdd4-57b8-45f1-b01f-430c2e32a2a7 1Gi RWO cephfs 35m

It mostly seems hardcoded
fsName:
https://github.com/charmed-kubernetes/ceph-csi-operator/blob/52dc3d10048c46a1f903238dfd253f1eab3e10b2/src/manifests_cephfs.py#L75
pool:
https://github.com/charmed-kubernetes/ceph-csi-operator/blob/52dc3d10048c46a1f903238dfd253f1eab3e10b2/src/manifests_cephfs.py#L60

As we don't have direct relation to ceph-fs but to ceph-mon or ceph-proxy
we need some other method for passing that information
Maybe better just define these properties directly in ceph-csi charm as config options ?

summary: - ceph-csi charm does not handle ceph-fs correctly
+ ceph-csi charm does not handle ceph-fs correctly: InvalidArgument desc =
+ volume not found
Changed in charm-ceph-csi:
status: New → In Progress
status: In Progress → Triaged
importance: Undecided → Medium
assignee: nobody → Kevin W Monroe (kwmonroe)
milestone: none → 1.29+ck1
Changed in charm-ceph-csi:
status: Triaged → In Progress
Revision history for this message
Kevin W Monroe (kwmonroe) wrote (last edit ):

Thanks for the report! We purposely don't expose the fsname as config, but rather discover it in the charm:

https://github.com/charmed-kubernetes/ceph-csi-operator/blob/release_1.29/src/charm.py#L235-L244

The problem is that we also cache that value:

https://github.com/charmed-kubernetes/ceph-csi-operator/blob/release_1.29/src/charm.py#L283

So if ceph-csi comes in before ceph-fs is deployed/related, that value will be None (because ceph-fs hasn't created the fs yet). And then we're stuck with it as the fsname in the ceph-fs storage class.

Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

We'll get this fixed in the 1.29 release of the charm since it's fairly easy to fail. We'll tackle the hard-coded problem in a maintenance release under bug lp:2054583.

Changed in charm-ceph-csi:
importance: Medium → High
milestone: 1.29+ck1 → 1.29
Revision history for this message
Bartosz Woronicz (mastier1) wrote :

Ah, ok, so potentially, Kevin, if I got whole ceph-mon and ceph-fs deployed before setting up the ceph-csi
It should correctly autodiscover fsName and pool.

Great that you found that silently setting that is not the right way.

Thank you!

Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

This missed the 1.29 GA; we'll target 1.29+ck1 and backport.

Changed in charm-ceph-csi:
milestone: 1.29 → 1.29+ck1
tags: added: backport-needed
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

Fix by blocking the charm if `cephfs-enable=True` and `fsname` cannot be discovered. This means the `ceph-fs` charm must ready and have created a valid fs before ceph-csi will create the storage class.

https://github.com/charmed-kubernetes/ceph-csi-operator/pull/15

Changed in charm-ceph-csi:
status: In Progress → Fix Committed
Changed in charm-ceph-csi:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.