Pods in a statefulset are blocked waiting for their PV with Cinder CSI

Bug #1853566 reported by Nicolas Pochet
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
CDK Addons
Incomplete
Undecided
Unassigned

Bug Description

When deploying several statefulsets using volumeClaimTemplates with Cinder CSI, some of the pods get blocked in ContainerCreating state.
The statefulsets used are:
kind: StatefulSet
metadata:
  name: web2
spec:
  serviceName: "nginx"
  replicas: 3
  selector:
    matchLabels:
      app: nginx2
  template:
    metadata:
      labels:
        app: nginx2
    spec:
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www2
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www2
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "cdk-cinder"
      resources:
        requests:
          storage: 1Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web3
spec:
  serviceName: "nginx"
  replicas: 4
  selector:
    matchLabels:
      app: nginx3
  template:
    metadata:
      labels:
        app: nginx3
    spec:
      containers:
      - name: nginx
        image: k8s.gcr.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www3
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www3
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "cdk-cinder"
      resources:
        requests:
          storage: 1Gi

We can see that some of the pods are in ContainerCreating state:
kubectl get all
NAME READY STATUS RESTARTS AGE
pod/web-0 1/1 Running 0 12m
pod/web-1 0/1 ContainerCreating 0 11m
pod/web2-0 1/1 Running 0 12m
pod/web2-1 1/1 Running 0 11m
pod/web2-2 0/1 ContainerCreating 0 10m
pod/web3-0 1/1 Running 0 12m
pod/web3-1 1/1 Running 0 11m
pod/web3-2 1/1 Running 0 11m
pod/web3-3 1/1 Running 0 10m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.152.183.1 <none> 443/TCP 13h

NAME READY AGE
statefulset.apps/web 1/2 12m
statefulset.apps/web2 2/3 12m
statefulset.apps/web3 4/4 12m

If we inspect one of them:
kubectl describe pod/web-1
Name: web-1
Namespace: default
Priority: 0
Node: juju-e43386-kubernetes-12/192.168.0.50
Start Time: Fri, 22 Nov 2019 07:36:41 +0000
Labels: app=nginx
                controller-revision-hash=web-b46f789c4
                statefulset.kubernetes.io/pod-name=web-1
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/web
Containers:
  nginx:
    Container ID:
    Image: k8s.gcr.io/nginx-slim:0.8
    Image ID:
    Port: 80/TCP
    Host Port: 0/TCP
    State: Waiting
      Reason: ContainerCreating
    Ready: False
    Restart Count: 0
    Environment: <none>
    Mounts:
      /usr/share/nginx/html from www (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-92wtr (ro)
Conditions:
  Type Status
  Initialized True
  Ready False
  ContainersReady False
  PodScheduled True
Volumes:
  www:
    Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName: www-web-1
    ReadOnly: false
  default-token-92wtr:
    Type: Secret (a volume populated by a Secret)
    SecretName: default-token-92wtr
    Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Warning FailedScheduling 12m (x2 over 12m) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
  Normal Scheduled 12m default-scheduler Successfully assigned default/web-1 to juju-e43386-kubernetes-12
  Normal SuccessfulAttachVolume 11m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-854c96f2-81fc-4b5f-96c5-1677adf8b811"
  Warning FailedMount 58s (x5 over 9m59s) kubelet, juju-e43386-kubernetes-12 Unable to attach or mount volumes: unmounted volumes=[www], unattached volumes=[www default-token-92wtr]: timed out waiting for the condition
  Warning FailedMount 25s (x11 over 11m) kubelet, juju-e43386-kubernetes-12 MountVolume.MountDevice failed for volume "pvc-854c96f2-81fc-4b5f-96c5-1677adf8b811" : rpc error: code = Internal desc = Unable to find Device path for volume

The volume exists in OpenStack and is attached to the correct host:
openstack volume list
+--------------------------------------+------------------------------------------+--------+------+----------------------------------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+------------------------------------------+--------+------+----------------------------------------------------+
| 0f8652c9-e94b-4a87-9a77-38218260d5ef | pvc-fec9eaa2-b6f2-45ab-99e8-6ac3c15deacb | in-use | 1 | Attached to juju-e43386-kubernetes-11 on /dev/vde |
| e8a9ebdf-085a-47e5-82ba-2f079631a23d | pvc-c8675811-3ff0-447d-9971-6165d02a1c52 | in-use | 1 | Attached to juju-e43386-kubernetes-12 on /dev/vdd |
| 034f27a0-a24e-4e9f-968e-50d2e3b484ad | pvc-935b6ab2-ee3c-466a-98c6-e6276e2e5bb9 | in-use | 1 | Attached to juju-e43386-kubernetes-13 on /dev/vdc |
| add6f146-9366-4e13-b433-37cf30e14c57 | pvc-61bbbe15-8954-4d9c-b007-7e86bc3a5de3 | in-use | 1 | Attached to juju-e43386-kubernetes-13 on /dev/vdb |
| 45a64a10-0162-4129-ad09-6e6ee23d78cb | pvc-854c96f2-81fc-4b5f-96c5-1677adf8b811 | in-use | 1 | Attached to juju-e43386-kubernetes-12 on /dev/vdc |
| 2588ff79-e78b-4c1b-b541-14f352f10ace | pvc-0f1174b8-a39b-49b6-8418-6b25c4c3a358 | in-use | 1 | Attached to juju-e43386-kubernetes-11 on /dev/vdd |
| e6cafb06-e3dc-4d70-8c56-e3ad62c43c2b | pvc-45c8db77-7192-4d68-850b-13713caa6a50 | in-use | 1 | Attached to juju-e43386-kubernetes-12 on /dev/vdb |
| 7706a18f-2c7d-4039-9988-89e013af2a6d | pvc-f3606270-a8d6-4cfd-9fbf-2fb5be566c24 | in-use | 1 | Attached to juju-e43386-kubernetes-11 on /dev/vdc |
| 50f806c7-cf12-4a1f-a118-be9165bf68ac | pvc-4566d695-b618-46fc-9b70-24846b00019d | in-use | 1 | Attached to juju-e43386-kubernetes-11 on /dev/vdb |
+--------------------------------------+------------------------------------------+--------+------+----------------------------------------------------+

SSH'ing into juju-e43386-kubernetes-12, we can see:
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 89.1M 1 loop /snap/core/8039
loop1 7:1 0 10.7M 1 loop /snap/kubectl/1357
loop2 7:2 0 23.4M 1 loop /snap/kubelet/1340
loop3 7:3 0 9.3M 1 loop /snap/kube-proxy/1373
loop4 7:4 0 8.5M 1 loop /snap/canonical-livepatch/88
vda 252:0 0 16G 0 disk
├─vda1 252:1 0 15.9G 0 part /
├─vda14 252:14 0 4M 0 part
└─vda15 252:15 0 106M 0 part /boot/efi
vdb 252:16 0 1G 0 disk /var/lib/kubelet/pods/111fa2f0-102b-4dc0-9d81-0bd1b7aa8250/volumes/kubernetes.io~csi/pvc-45c8db77-7192-4d68-850b-13713caa6a50/mount
vdc 252:32 0 1G 0 disk
vdd 252:48 0 1G 0 disk

We can see that the disks are indeed attached but not mounted

We can also see the following logs in syslog:
tail /var/log/syslog
Nov 22 07:58:57 juju-e43386-kubernetes-12 systemd[1]: Reloading.
Nov 22 07:58:58 juju-e43386-kubernetes-12 systemd[1]: Reloading.
Nov 22 07:58:58 juju-e43386-kubernetes-12 kubelet.daemon[24002]: E1122 07:58:58.957611 24002 kubelet.go:1682] Unable to attach or mount volumes for pod "web-1_default(7d2ba70e-92cf-4c7b-b617-5eaabdd158bc)": unmounted volumes=[www], unattached volumes=[www default-token-92wtr]: timed out waiting for the condition; skipping pod
Nov 22 07:58:58 juju-e43386-kubernetes-12 kubelet.daemon[24002]: E1122 07:58:58.957652 24002 pod_workers.go:191] Error syncing pod 7d2ba70e-92cf-4c7b-b617-5eaabdd158bc ("web-1_default(7d2ba70e-92cf-4c7b-b617-5eaabdd158bc)"), skipping: unmounted volumes=[www], unattached volumes=[www default-token-92wtr]: timed out waiting for the condition
Nov 22 07:59:17 juju-e43386-kubernetes-12 kubelet.daemon[24002]: I1122 07:59:17.596238 24002 clientconn.go:104] parsed scheme: ""
Nov 22 07:59:17 juju-e43386-kubernetes-12 kubelet.daemon[24002]: I1122 07:59:17.596277 24002 clientconn.go:104] scheme "" not registered, fallback to default scheme
Nov 22 07:59:17 juju-e43386-kubernetes-12 kubelet.daemon[24002]: I1122 07:59:17.596365 24002 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/cinder.csi.openstack.org/csi.sock 0 <nil>}] <nil>}
Nov 22 07:59:17 juju-e43386-kubernetes-12 kubelet.daemon[24002]: I1122 07:59:17.596380 24002 clientconn.go:577] ClientConn switching balancer to "pick_first"
Nov 22 07:59:36 juju-e43386-kubernetes-12 kubelet.daemon[24002]: E1122 07:59:36.007548 24002 kubelet.go:1682] Unable to attach or mount volumes for pod "web2-2_default(7d3f089a-2a85-41bf-aa2a-22bbf7afd36b)": unmounted volumes=[www2], unattached volumes=[www2 default-token-92wtr]: timed out waiting for the condition; skipping pod
Nov 22 07:59:36 juju-e43386-kubernetes-12 kubelet.daemon[24002]: E1122 07:59:36.007609 24002 pod_workers.go:191] Error syncing pod 7d3f089a-2a85-41bf-aa2a-22bbf7afd36b ("web2-2_default(7d3f089a-2a85-41bf-aa2a-22bbf7afd36b)"), skipping: unmounted volumes=[www2], unattached volumes=[www2 default-token-92wtr]: timed out waiting for the condition

Revision history for this message
Nicolas Pochet (npochet) wrote :
Revision history for this message
Nicolas Pochet (npochet) wrote :

Subscribing field-critical as it is blocking a customer deployment

summary: - Pods in a statefulset are blocked waiting for their PV
+ Pods in a statefulset are blocked waiting for their PV with Cinder CSI
Revision history for this message
Nicolas Pochet (npochet) wrote :

After more investigation, it might be related to the fact that CSI plugin is looking for disks in /dev/disk/by-id and that it is not present there:

ubuntu@juju-e43386-kubernetes-12:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 89.1M 1 loop /snap/core/8039
loop1 7:1 0 10.7M 1 loop /snap/kubectl/1357
loop2 7:2 0 23.4M 1 loop /snap/kubelet/1340
loop3 7:3 0 9.3M 1 loop /snap/kube-proxy/1373
loop4 7:4 0 8.5M 1 loop /snap/canonical-livepatch/88
vda 252:0 0 16G 0 disk
├─vda1 252:1 0 15.9G 0 part /
├─vda14 252:14 0 4M 0 part
└─vda15 252:15 0 106M 0 part /boot/efi
vdb 252:16 0 1G 0 disk /var/lib/kubelet/pods/ad71c7f0-4dc2-4b17-95a5-5ffe15ea22a4/volumes/kubernetes.io~csi/pvc-c015cc20-bdaf-4e13-9d92-9a3d853e0b76/mount
vdc 252:32 0 1G 0 disk
vdd 252:48 0 1G 0 disk /var/lib/kubelet/pods/7afb38c7-5bc6-463d-8cda-87fb83767695/volumes/kubernetes.io~csi/pvc-485ada5b-7c57-435e-a16e-fd8f17159111/mount
vde 252:64 0 1G 0 disk /var/lib/kubelet/pods/e3ab909f-bb65-4ff7-bdfe-97c82cb8bdb2/volumes/kubernetes.io~csi/pvc-9422b81b-adc1-427f-898c-49b85dc9491c/mount
ubuntu@juju-e43386-kubernetes-12:~$ ll /dev/disk/by-id/
total 0
drwxr-xr-x 2 root root 100 Nov 22 09:46 ./
drwxr-xr-x 7 root root 140 Nov 22 09:44 ../
lrwxrwxrwx 1 root root 9 Nov 22 09:44 virtio-472e0f6d-364d-44f1-9 -> ../../vdd
lrwxrwxrwx 1 root root 9 Nov 22 09:46 virtio-487a0339-fe7d-42ae-a -> ../../vde
lrwxrwxrwx 1 root root 9 Nov 22 09:44 virtio-dc8306b6-7b05-4812-a -> ../../vdb

We can see that /dev/vdc is present in lsblk but does not have a symlink in /dev/disk/by-id
The related code is: https://github.com/kubernetes/cloud-provider-openstack/blob/38530f20eb83ac07f4b82924f5515e525ae3890a/pkg/csi/cinder/mount/mount.go#L136
This new behaviour was introduced by: https://github.com/kubernetes/cloud-provider-openstack/pull/595

Revision history for this message
Nicolas Pochet (npochet) wrote :

sudo udevadm trigger --subsystem-match=block; udevadm settle can be used to re-create manually the symlinks on the worker nodes.
It could be automated with:
juju run --application kubernetes-worker 'sudo udevadm trigger --subsystem-match=block; udevadm settle; lsblk'

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Hi, is this still a field-critical issue?

I don't have access to the mentioned juju bundle yet (I've requested access), so for the time being I have deployed CDK on OpenStack and then used 'kubectl apply' to deploy the mentioned statefulsets, but I didn't hit the issue.

Is this happening all the time? Any specific versions/revisions?

Changed in cdk-addons:
status: New → Incomplete
Revision history for this message
Nicolas Pochet (npochet) wrote :

It was a field-critical issue, 4 months ago.
Funny that engineering is responding only now...

Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

Yes, we dropped the ball on this one - sorry about that.

Here's the current status:

1. We can unsubscribe ~field-critical b/c a workaround exists and is documented in this bug.
2. We are unable to repro with the current version of cdk-addons, which includes cinder-csi-plugin:v1.17.0 instead of cinder-csi-plugin:v1.15.0, which was the version that shipped with CK 1.16.

I suspect this is bug that was fixed in cinder-csi. Please subscribe ~field-high if this issue is still a high priority.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.