Charmed operator on K8s stuck on allocating due to PVC problems

Bug #1946574 reported by Michele Mancioppi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Yang Kelvin Liu

Bug Description

This is a transient error I see from time to time. On a simple setup with a Juju controller on MicroK8s (both current stable), some Juju units are forever stuck in `allocating` due to underpinning issues, either images that cannot be fetched or, as in this case, PVC issues:

```
michele@boombox:~/git/grafana-operator$ microk8s.kubectl describe pod grafana-0 -n spring
Name: grafana-0
Namespace: spring
Priority: 0
Node: <none>
Labels: app.kubernetes.io/name=grafana
                controller-revision-hash=grafana-67478877c8
                statefulset.kubernetes.io/pod-name=grafana-0
Annotations: controller.juju.is/id: 143e2dc6-0631-4b68-8c72-a09fe513e46f
                juju.is/version: 2.9.15
                model.juju.is/id: dd663c74-0cbb-4efb-85db-368a45856f5c
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/grafana
Init Containers:
  charm-init:
    Image: jujusolutions/jujud-operator:2.9.15
    Port: <none>
    Host Port: <none>
    Command:
      /opt/containeragent
    Args:
      init
      --data-dir
      /var/lib/juju
      --bin-dir
      /charm/bin
    Environment Variables from:
      grafana-application-config Secret Optional: false
    Environment:
      JUJU_CONTAINER_NAMES: grafana
      JUJU_K8S_POD_NAME: grafana-0 (v1:metadata.name)
      JUJU_K8S_POD_UUID: (v1:metadata.uid)
    Mounts:
      /charm/bin from charm-data (rw,path="charm/bin")
      /charm/containers from charm-data (rw,path="charm/containers")
      /var/lib/juju from charm-data (rw,path="var/lib/juju")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qpqkc (ro)
Containers:
  charm:
    Image: jujusolutions/charm-base:ubuntu-20.04
    Port: <none>
    Host Port: <none>
    Command:
      /charm/bin/containeragent
    Args:
      unit
      --data-dir
      /var/lib/juju
      --charm-modified-version
      0
      --append-env
      PATH=$PATH:/charm/bin
    Liveness: http-get http://:3856/liveness delay=30s timeout=1s period=10s #success=1 #failure=2
    Readiness: http-get http://:3856/readiness delay=30s timeout=1s period=10s #success=1 #failure=2
    Startup: http-get http://:3856/startup delay=30s timeout=1s period=10s #success=1 #failure=2
    Environment:
      JUJU_CONTAINER_NAMES: grafana
      HTTP_PROBE_PORT: 3856
    Mounts:
      /charm/bin from charm-data (ro,path="charm/bin")
      /charm/containers from charm-data (rw,path="charm/containers")
      /var/lib/juju from charm-data (rw,path="var/lib/juju")
      /var/lib/juju/storage/database/0 from grafana-database-b9273127 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qpqkc (ro)
  grafana:
    Image: ubuntu/grafana:latest
    Port: <none>
    Host Port: <none>
    Command:
      /charm/bin/pebble
    Args:
      run
      --create-dirs
      --hold
      --verbose
    Environment:
      JUJU_CONTAINER_NAME: grafana
      PEBBLE_SOCKET: /charm/container/pebble.socket
    Mounts:
      /charm/bin/pebble from charm-data (ro,path="charm/bin/pebble")
      /charm/container from charm-data (rw,path="charm/containers/grafana")
      /var/lib/grafana from grafana-database-b9273127 (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qpqkc (ro)
Conditions:
  Type Status
  PodScheduled False
Volumes:
  grafana-database-b9273127:
    Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName: grafana-database-b9273127-grafana-0
    ReadOnly: false
  charm-data:
    Type: EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit: <unset>
  kube-api-access-qpqkc:
    Type: Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds: 3607
    ConfigMapName: kube-root-ca.crt
    ConfigMapOptional: <nil>
    DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Warning FailedScheduling 108s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
  Warning FailedScheduling 38s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
```

There is nothing conspicuous in the Juju logs. It appears to be a purely K8s issue that Juju cannot adequately represent with an erroneous status.

Tags: k8s
Revision history for this message
Ian Booth (wallyworld) wrote :

This might be an issue with newer sidecar charms as I am fairly sure v1 charms did surface in juju status any such underlying k8s errors.

tags: added: k8s
Changed in juju:
milestone: none → 2.9.17
importance: Undecided → High
status: New → Triaged
Changed in juju:
milestone: 2.9.17 → 2.9.18
Changed in juju:
milestone: 2.9.18 → 2.9.19
Changed in juju:
milestone: 2.9.19 → 2.9.20
Changed in juju:
milestone: 2.9.20 → 2.9.21
Changed in juju:
milestone: 2.9.21 → 2.9.22
Changed in juju:
milestone: 2.9.22 → 2.9.23
Changed in juju:
milestone: 2.9.23 → 2.9.24
Changed in juju:
milestone: 2.9.24 → 2.9.25
Changed in juju:
milestone: 2.9.25 → 2.9.26
Changed in juju:
milestone: 2.9.26 → 2.9.27
Changed in juju:
milestone: 2.9.27 → 2.9.28
Changed in juju:
milestone: 2.9.28 → 2.9.29
Changed in juju:
milestone: 2.9.29 → 2.9.30
Changed in juju:
assignee: nobody → Yang Kelvin Liu (kelvin.liu)
status: Triaged → In Progress
Revision history for this message
Yang Kelvin Liu (kelvin.liu) wrote :
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
milestone: 2.9.30 → 2.9.31
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.