Pod can't start up with error 'no device is mounted...' due to kube-sriov-device-plugin referring to rbd device
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Fabiano Correa Mercer |
Bug Description
Brief Description
-----------------
Stateful Pod using PersistentVolum
This issue is visible using k8s version:: 1.21.8, but not in k8s version: 1.24.4
Severity
--------
Major
Steps to Reproduce
------------------
Enable SR-IOV and create SR-IOV interfaces in order to have kube-sriov-
Create the PersistentVolum
---
kind: PersistentVolum
apiVersion: v1
metadata:
name: rwo-test-claim1
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: general
---
kind: PersistentVolum
apiVersion: v1
metadata:
name: rwo-test-claim2
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: general
Create the StatefulSet POD ( busybox )
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: busybox-set
spec:
selector:
matchLabels:
app: busybox
serviceName: "busybox"
replicas: 1
template:
metadata:
labels:
app: busybox
spec:
terminati
containers:
- args:
- sh
image: busybox
name: busybox
stdin: true
tty: true
- name: pvc1
- name: pvc2
restartPo
volumes:
- name: pvc1
- name: pvc2
volumeClaimTe
- metadata:
name: mysql-store
spec:
accessModes: ["ReadWriteOnce"]
storageCl
resources:
requests:
storage: 1Gi
Delete the sriov device plugin pod: kube-sriov-
Delete the busybox pod
Observe the busybox pod is stuck in ContainerCreating with failure to mount the volumes.
Expected Behavior
------------------
Stateful POD should be recreated and running
Actual Behavior
----------------
Stateful POD get stuck in "ContainerCreating" State.
Reproducibility
---------------
100% using k8s version: 1.21.8
System Configuration
-------
AIO-SX
Branch/Pull Time/Commit
-------
N/A
Last Pass
---------
N/A
Timestamp/Logs
--------------
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 66m default-scheduler Successfully assigned default/
Warning FailedMount 49m (x3 over 55m) kubelet Unable to attach or mount volumes: unmounted volumes=
Warning FailedMount 46m kubelet Unable to attach or mount volumes: unmounted volumes=
Warning FailedMount 44m (x4 over 62m) kubelet Unable to attach or mount volumes: unmounted volumes=
Warning FailedMount 26m (x5 over 64m) kubelet Unable to attach or mount volumes: unmounted volumes=
Warning FailedMount 6m5s (x4 over 39m) kubelet Unable to attach or mount volumes: unmounted volumes=
Warning FailedMount 92s (x40 over 66m) kubelet MountVolume.SetUp failed for volume "pvc-af954946-
Test Activity
-------------
Developer Testing
Workaround
----------
1. Restart kubelet service
2. Change the Mounting Point from /var/lib/kubelet/ to /var/lib/
kubectl patch DaemonSet kube-sriov-
Changed in starlingx: | |
assignee: | nobody → Fabiano Correa Mercer (fcorream) |
description: | updated |
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.9.0 stx.networking |
The root cause is that kube-sriov- device- plugin pod will mount HostPath '/var/lib/ kubelet/ '. If a pod mounting with rbd PVC is already running when kube-sriov- device- plugin pod starts up, the kube-sriov- device- plugin pod will refer to the rbd mountpoint which is under HostPath '/var/lib/ kubelet/ '. Even if the rbd is unmounted from the mountpoint on the host, the pod will keep referring to it in its namespace. So kubelet can't unmap the rbd and resulting in the issue finally.