Pod can't start up with error 'no device is mounted...' due to kube-sriov-device-plugin referring to rbd device

Bug #2007596 reported by Fabiano Correa Mercer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Fabiano Correa Mercer

Bug Description

Brief Description
-----------------
Stateful Pod using PersistentVolumeClaim (PVC) type: RBD gets stuck in the 'Init:0/1' state after an upgrade ( that deletes and recreates the POD ).
This issue is visible using k8s version:: 1.21.8, but not in k8s version: 1.24.4

Severity
--------
Major

Steps to Reproduce
------------------
Enable SR-IOV and create SR-IOV interfaces in order to have kube-sriov-device-plugin POD running.

Create the PersistentVolumeClaim spec:
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: rwo-test-claim1
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: general
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: rwo-test-claim2
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: general

Create the StatefulSet POD ( busybox )
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: busybox-set
spec:
  selector:
    matchLabels:
      app: busybox
  serviceName: "busybox"
  replicas: 1
  template:
    metadata:
      labels:
        app: busybox
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - args:
        - sh
        image: busybox
        imagePullPolicy: Always
        name: busybox
        stdin: true
        tty: true
        volumeMounts:
        - name: pvc1
          mountPath: "/mnt1"
        - name: pvc2
          mountPath: "/mnt2"
      restartPolicy: Always
      volumes:
      - name: pvc1
        persistentVolumeClaim:
          claimName: rwo-test-claim1
      - name: pvc2
        persistentVolumeClaim:
          claimName: rwo-test-claim2
  volumeClaimTemplates:
  - metadata:
      name: mysql-store
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: "general"
      resources:
        requests:
          storage: 1Gi

Delete the sriov device plugin pod: kube-sriov-device-plugin
Delete the busybox pod
Observe the busybox pod is stuck in ContainerCreating with failure to mount the volumes.

Expected Behavior
------------------
Stateful POD should be recreated and running

Actual Behavior
----------------
Stateful POD get stuck in "ContainerCreating" State.

Reproducibility
---------------
100% using k8s version: 1.21.8

System Configuration
--------------------
AIO-SX

Branch/Pull Time/Commit
-----------------------
N/A

Last Pass
---------
N/A

Timestamp/Logs
--------------

  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Normal Scheduled 66m default-scheduler Successfully assigned default/df66666693071-d0-6578f454b9-jskdb to controller-0
  Warning FailedMount 49m (x3 over 55m) kubelet Unable to attach or mount volumes: unmounted volumes=[df66666693071-d-db], unattached volumes=[df66666693071-d-log kube-api-access-hvm46 env df66666693071-d-shared df66666693071-d-db]: timed out waiting for the condition
  Warning FailedMount 46m kubelet Unable to attach or mount volumes: unmounted volumes=[df66666693071-d-db], unattached volumes=[df66666693071-d-shared df66666693071-d-db df66666693071-d-log kube-api-access-hvm46 env]: timed out waiting for the condition
  Warning FailedMount 44m (x4 over 62m) kubelet Unable to attach or mount volumes: unmounted volumes=[df66666693071-d-db], unattached volumes=[env df66666693071-d-shared df66666693071-d-db df66666693071-d-log kube-api-access-hvm46]: timed out waiting for the condition
  Warning FailedMount 26m (x5 over 64m) kubelet Unable to attach or mount volumes: unmounted volumes=[df66666693071-d-db], unattached volumes=[df66666693071-d-db df66666693071-d-log kube-api-access-hvm46 env df66666693071-d-shared]: timed out waiting for the condition
  Warning FailedMount 6m5s (x4 over 39m) kubelet Unable to attach or mount volumes: unmounted volumes=[df66666693071-d-db], unattached volumes=[kube-api-access-hvm46 env df66666693071-d-shared df66666693071-d-db df66666693071-d-log]: timed out waiting for the condition
  Warning FailedMount 92s (x40 over 66m) kubelet MountVolume.SetUp failed for volume "pvc-af954946-0f69-496f-95be-e37cda69b981" : no device is mounted at /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-rbd-image-kubernetes-dynamic-pvc-6c501229-9fa9-11ed-82a1-46177d2292f9

Test Activity
-------------
Developer Testing

Workaround
----------
1. Restart kubelet service
2. Change the Mounting Point from /var/lib/kubelet/ to /var/lib/kubelet/device-plugins/ like other plugins.

kubectl patch DaemonSet kube-sriov-device-plugin-amd64 -n kube-system --type json -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/volumeMounts/0/mountPath", "value":"/var/lib/kubelet/device-plugins/"}, {"op": "replace", "path": "/spec/template/spec/volumes/0/hostPath/path", "value":"/var/lib/kubelet/device-plugins/"}]'

Changed in starlingx:
assignee: nobody → Fabiano Correa Mercer (fcorream)
Revision history for this message
Fabiano Correa Mercer (fcorream) wrote :

The root cause is that kube-sriov-device-plugin pod will mount HostPath '/var/lib/kubelet/'. If a pod mounting with rbd PVC is already running when kube-sriov-device-plugin pod starts up, the kube-sriov-device-plugin pod will refer to the rbd mountpoint which is under HostPath '/var/lib/kubelet/'. Even if the rbd is unmounted from the mountpoint on the host, the pod will keep referring to it in its namespace. So kubelet can't unmap the rbd and resulting in the issue finally.

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.networking
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/874156
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/dd51ac2844d63b11d73fd552aa6b73dc1ac858f6
Submitter: "Zuul (22348)"
Branch: master

commit dd51ac2844d63b11d73fd552aa6b73dc1ac858f6
Author: Fabiano Correa Mercer <email address hidden>
Date: Thu Feb 16 16:32:58 2023 -0300

    Restrict the SRIOV device plugin mount path

    The kube-sriov-device-plugin pod will mount HostPath:'/var/lib/kubelet'
    If a pod mounting with rbd PVC is already running when
    kube-sriov-device-plugin pod starts up, the kube-sriov-device-plugin
    pod will refer to the rbd mountpoint which is under HostPath:
    '/var/lib/kubelet'.
    Even if the rbd is unmounted from the mountpoint on the host, the pod
    will keep referring to it in its namespace.
    So kubelet can't unmap the rbd and will fail to mount the volume when
    pod with rbd PVC is recreated.
    The kube-sriov-device-plugin doesn't need to use '/var/lib/kubelet' as
    mountpath because its internal device socket is actually at
    /var/lib/kubelet/device-plugins/.
    Changing the kube-sriov-device-plugin mountpath to a less broad path
    will preserve the rbd PVC mount point under /var/lib/kubelet/.

    Test plan
    PASS Installed AIO-SX
         create SRIOV interfaces
         create stateful pod with rbd PVC
         delete kube-sriov-device-plugin
         delete stateful pod
         A new stateful pod will automatically be created
         check if stateful pod was not stuck
         confirm if stateful pod could mount the volume
    PASS Create a SRIOV NetworkAttachmentDefinition
         Launch a POD using the SRIOV interface
         check if POD is running and if POD has connectivity.
    PASS Upgrades testing (partial) - verified controller-1 is upgraded and device plugin/pod working with new location.

    Closes-Bug: #2007596

    Signed-off-by: Fabiano Mercer <email address hidden>
    Change-Id: I7ef43a1c0ac4f7f0af1a366c298b4c1029d3e915

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.