Canonical Juju

K8s controller failed to bootstrap with 2.8/edge

Bug #1883944 reported by David on 2020-06-17

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	Undecided	Unassigned

Bug Description

Error pulling the image when bootstrapping a controller from 2.8/edge

$ microk8s.kubectl describe pods controller-0 -n controller-microk8s-localhost
Name: controller-0
Namespace: controller-microk8s-localhost
Priority: 0
Node: canonical/192.168.0.11
Start Time: Wed, 17 Jun 2020 18:34:45 +0200
Labels: controller-revision-hash=controller-687ff6c6f8
              juju-app=controller
              statefulset.kubernetes.io/pod-name=controller-0
Annotations: juju.io/controller: b68567ec-5a30-4213-87e4-e4bdfc414ba3
Status: Pending
IP: 10.1.73.30
IPs:
  IP: 10.1.73.30
Controlled By: StatefulSet/controller
Containers:
  mongodb:
    Container ID: containerd://c679f9989616788a478f326e2751df14f36c629763e577d03015c20b1a673e81
    Image: jujusolutions/juju-db:4.0
    Image ID: docker.io/jujusolutions/juju-db@sha256:5ff4514ff351575aa37842896ea330dc14ea026f0baabddc7d7988eeda72bd3d
    Port: 37017/TCP
    Host Port: 0/TCP
    Command:
      mongod
    Args:
      --dbpath=/var/lib/juju/db
      --sslPEMKeyFile=/var/lib/juju/server.pem
      --sslPEMKeyPassword=ignored
      --sslMode=requireSSL
      --port=37017
      --journal
      --replSet=juju
      --quiet
      --oplogSize=1024
      --ipv6
      --auth
      --keyFile=/var/lib/juju/shared-secret
      --storageEngine=wiredTiger
      --bind_ip_all
    State: Waiting
      Reason: CrashLoopBackOff
    Last State: Terminated
      Reason: Error
      Exit Code: 1
      Started: Wed, 17 Jun 2020 18:35:13 +0200
      Finished: Wed, 17 Jun 2020 18:35:13 +0200
    Ready: False
    Restart Count: 2
    Limits:
      memory: 1536Mi
    Requests:
      memory: 1536Mi
    Liveness: exec [mongo --port=37017 --ssl --sslAllowInvalidHostnames --sslAllowInvalidCertificates --sslPEMKeyFile=/var/lib/juju/server.pem --eval db.adminCommand('ping')] delay=30s timeout=5s period=10s #success=1 #failure=3
    Readiness: exec [mongo --port=37017 --ssl --sslAllowInvalidHostnames --sslAllowInvalidCertificates --sslPEMKeyFile=/var/lib/juju/server.pem --eval db.adminCommand('ping')] delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment: <none>
    Mounts:
      /var/lib/juju from storage (rw)
      /var/lib/juju/db from storage (rw,path="db")
      /var/lib/juju/shared-secret from controller-shared-secret (ro,path="shared-secret")
      /var/lib/juju/template-server.pem from controller-server-pem (ro,path="template-server.pem")
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-f5snz (ro)
  api-server:
    Container ID:
    Image: jujusolutions/jujud-operator:2.8.1.3802
    Image ID:
    Port: <none>
    Host Port: <none>
    Command:
      /bin/sh
    Args:
      -c
      export JUJU_DATA_DIR=/var/lib/juju
      export JUJU_TOOLS_DIR=$JUJU_DATA_DIR/tools

mkdir -p $JUJU_TOOLS_DIR
cp /opt/jujud $JUJU_TOOLS_DIR/jujud

      echo Installing Dashboard...
      export gui='/var/lib/juju/gui'
      mkdir -p $gui
      curl -sSf -o $gui/gui.tar.bz2 --retry 10 --noproxy 127.0.0.1,localhost,::1 'https://streams.canonical.com/juju/gui/gui/0.1.7/juju-dashboard-0.1.7.tar.bz2' || echo Unable to retrieve Juju Dashboard
      [ -f $gui/gui.tar.bz2 ] && sha256sum $gui/gui.tar.bz2 > $gui/jujugui.sha256
      [ -f $gui/jujugui.sha256 ] && (grep 'e3215baf556a8bdc8b35eed6cdd064e01d6584f6da949eb34ec970cd1e30b030' $gui/jujugui.sha256 && printf %s '{"version":"0.1.7","url":"https://streams.canonical.com/juju/gui/gui/0.1.7/juju-dashboard-0.1.7.tar.bz2","sha256":"e3215baf556a8bdc8b35eed6cdd064e01d6584f6da949eb34ec970cd1e30b030","size":1241775}' > $gui/downloaded-gui.txt || echo Juju GUI checksum mismatch)
      test -e $JUJU_DATA_DIR/agents/controller-0/agent.conf || $JUJU_TOOLS_DIR/jujud bootstrap-state $JUJU_DATA_DIR/bootstrap-params --data-dir $JUJU_DATA_DIR --show-log --timeout 20m0s
      $JUJU_TOOLS_DIR/jujud machine --data-dir $JUJU_DATA_DIR --controller-id 0 --log-to-stderr --show-log

    State: Waiting
      Reason: ImagePullBackOff
    Ready: False
    Restart Count: 0
    Limits:
      memory: 1536Mi
    Requests:
      memory: 1536Mi
    Environment: <none>
    Mounts:
      /var/lib/juju from storage (rw)
      /var/lib/juju/agents/controller-0/template-agent.conf from controller-agent-conf (rw,path="template-agent.conf")
      /var/lib/juju/bootstrap-params from controller-bootstrap-params (ro,path="bootstrap-params")
      /var/lib/juju/shared-secret from controller-shared-secret (ro,path="shared-secret")
      /var/lib/juju/template-server.pem from controller-server-pem (ro,path="template-server.pem")
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-f5snz (ro)
Conditions:
  Type Status
  Initialized True
  Ready False
  ContainersReady False
  PodScheduled True
Volumes:
  storage:
    Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName: storage-controller-0
    ReadOnly: false
  controller-server-pem:
    Type: Secret (a volume populated by a Secret)
    SecretName: controller-secret
    Optional: false
  controller-shared-secret:
    Type: Secret (a volume populated by a Secret)
    SecretName: controller-secret
    Optional: false
  controller-agent-conf:
    Type: ConfigMap (a volume populated by a ConfigMap)
    Name: controller-configmap
    Optional: false
  controller-bootstrap-params:
    Type: ConfigMap (a volume populated by a ConfigMap)
    Name: controller-configmap
    Optional: false
  default-token-f5snz:
    Type: Secret (a volume populated by a Secret)
    SecretName: default-token-f5snz
    Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Warning FailedScheduling <unknown> default-scheduler running "VolumeBinding" filter plugin for pod "controller-0": pod has unbound immediate PersistentVolumeClaims
  Warning FailedScheduling <unknown> default-scheduler running "VolumeBinding" filter plugin for pod "controller-0": pod has unbound immediate PersistentVolumeClaims
  Normal Scheduled <unknown> default-scheduler Successfully assigned controller-microk8s-localhost/controller-0 to canonical
  Warning BackOff 30s (x2 over 39s) kubelet, canonical Back-off restarting failed container
  Normal Pulling 30s (x2 over 42s) kubelet, canonical Pulling image "jujusolutions/jujud-operator:2.8.1.3802"
  Warning Failed 29s (x2 over 40s) kubelet, canonical Failed to pull image "jujusolutions/jujud-operator:2.8.1.3802": rpc error: code = Unknown desc = failed to resolve image "docker.io/jujusolutions/jujud-operator:2.8.1.3802": no available registry endpoint: docker.io/jujusolutions/jujud-operator:2.8.1.3802 not found
  Warning Failed 29s (x2 over 40s) kubelet, canonical Error: ErrImagePull
  Normal Pulled 17s (x3 over 43s) kubelet, canonical Container image "jujusolutions/juju-db:4.0" already present on machine
  Normal Created 16s (x3 over 43s) kubelet, canonical Created container mongodb
  Normal BackOff 16s (x4 over 39s) kubelet, canonical Back-off pulling image "jujusolutions/jujud-operator:2.8.1.3802"
  Warning Failed 16s (x4 over 39s) kubelet, canonical Error: ImagePullBackOff
  Normal Started 16s (x3 over 42s) kubelet, canonical Started container mongodb

Revision history for this message

Pen Gale (pengale) wrote on 2020-06-17:

The problem here is probably that the 2.8.1 docker image (docker.io/jujusolutions/jujud-operator:2.8.1.3802) isn't uploaded to docker.io.

I'm not sure whether this is an issue w/ the CI not having a job to upload an image for edge, or an issue w/ the image getting tagged the wrong way for the edge snap. Either way, we should probably fix it so that folks can do testing on k8s w/ the edge snap!

Revision history for this message

Ian Booth (wallyworld) wrote on 2020-06-17:

Sometimes if there's an infrastructure issue on our Jenkins, the oci image fails to build. The current 2.8.1 snap (187f1c3) had a failed Jenkins job.

The job has just been re-run and the correct image is now on dockerhub.

Changed in juju:
status:	New → Fix Committed

Harry Pidcock (hpidcock) on 2021-10-18

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.