Pods in a statefulset are blocked waiting for their PV with Cinder CSI
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
CDK Addons |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
When deploying several statefulsets using volumeClaimTemp
The statefulsets used are:
kind: StatefulSet
metadata:
name: web2
spec:
serviceName: "nginx"
replicas: 3
selector:
matchLabels:
app: nginx2
template:
metadata:
labels:
app: nginx2
spec:
containers:
- name: nginx
image: k8s.gcr.
ports:
- containerPort: 80
name: web
- name: www2
volumeClaimTe
- metadata:
name: www2
spec:
accessModes: [ "ReadWriteOnce" ]
storageCl
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web3
spec:
serviceName: "nginx"
replicas: 4
selector:
matchLabels:
app: nginx3
template:
metadata:
labels:
app: nginx3
spec:
containers:
- name: nginx
image: k8s.gcr.
ports:
- containerPort: 80
name: web
- name: www3
volumeClaimTe
- metadata:
name: www3
spec:
accessModes: [ "ReadWriteOnce" ]
storageCl
resources:
requests:
storage: 1Gi
We can see that some of the pods are in ContainerCreating state:
kubectl get all
NAME READY STATUS RESTARTS AGE
pod/web-0 1/1 Running 0 12m
pod/web-1 0/1 ContainerCreating 0 11m
pod/web2-0 1/1 Running 0 12m
pod/web2-1 1/1 Running 0 11m
pod/web2-2 0/1 ContainerCreating 0 10m
pod/web3-0 1/1 Running 0 12m
pod/web3-1 1/1 Running 0 11m
pod/web3-2 1/1 Running 0 11m
pod/web3-3 1/1 Running 0 10m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 13h
NAME READY AGE
statefulset.
statefulset.
statefulset.
If we inspect one of them:
kubectl describe pod/web-1
Name: web-1
Namespace: default
Priority: 0
Node: juju-e43386-
Start Time: Fri, 22 Nov 2019 07:36:41 +0000
Labels: app=nginx
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/web
Containers:
nginx:
Container ID:
Image: k8s.gcr.
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/
/
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
www:
Type: PersistentVolum
ClaimName: www-web-1
ReadOnly: false
default-
Type: Secret (a volume populated by a Secret)
SecretName: default-token-92wtr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 12m (x2 over 12m) default-scheduler pod has unbound immediate PersistentVolum
Normal Scheduled 12m default-scheduler Successfully assigned default/web-1 to juju-e43386-
Normal SuccessfulAttac
Warning FailedMount 58s (x5 over 9m59s) kubelet, juju-e43386-
Warning FailedMount 25s (x11 over 11m) kubelet, juju-e43386-
The volume exists in OpenStack and is attached to the correct host:
openstack volume list
+------
| ID | Name | Status | Size | Attached to |
+------
| 0f8652c9-
| e8a9ebdf-
| 034f27a0-
| add6f146-
| 45a64a10-
| 2588ff79-
| e6cafb06-
| 7706a18f-
| 50f806c7-
+------
SSH'ing into juju-e43386-
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 89.1M 1 loop /snap/core/8039
loop1 7:1 0 10.7M 1 loop /snap/kubectl/1357
loop2 7:2 0 23.4M 1 loop /snap/kubelet/1340
loop3 7:3 0 9.3M 1 loop /snap/kube-
loop4 7:4 0 8.5M 1 loop /snap/canonical
vda 252:0 0 16G 0 disk
├─vda1 252:1 0 15.9G 0 part /
├─vda14 252:14 0 4M 0 part
└─vda15 252:15 0 106M 0 part /boot/efi
vdb 252:16 0 1G 0 disk /var/lib/
vdc 252:32 0 1G 0 disk
vdd 252:48 0 1G 0 disk
We can see that the disks are indeed attached but not mounted
We can also see the following logs in syslog:
tail /var/log/syslog
Nov 22 07:58:57 juju-e43386-
Nov 22 07:58:58 juju-e43386-
Nov 22 07:58:58 juju-e43386-
Nov 22 07:58:58 juju-e43386-
Nov 22 07:59:17 juju-e43386-
Nov 22 07:59:17 juju-e43386-
Nov 22 07:59:17 juju-e43386-
Nov 22 07:59:17 juju-e43386-
Nov 22 07:59:36 juju-e43386-
Nov 22 07:59:36 juju-e43386-
K8s bundle: https:/ /drive. google. com/open? id=1ubBBfjV6aZ5 oV1huBVfhvdSJnk hg6RUU /drive. google. com/open? id=1wWesz3- svAM4YwyxOQ8Syr lM4LZDg8wj
sosreport from machine 12: https:/