Charm deployment hangs with 2.8

Bug #1867783 reported by Kenneth Koski
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Harry Pidcock

Bug Description

I've got a bundle that I'm trying to deploy to MicroK8s. Some of the charms deploy normally, but eventually one of the operator pods will start getting messages like this, and no further progress is made in deploying anything:

Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Warning FailedScheduling 20m (x2 over 20m) default-scheduler error while running "VolumeBinding" filter plugin for pod "istio-ingressgateway-operator-0": pod has unbound immediate PersistentVolumeClaims
  Normal Scheduled 20m default-scheduler Successfully assigned istio/istio-ingressgateway-operator-0 to cat
  Warning FailedMount 12m (x12 over 20m) kubelet, cat MountVolume.SetUp failed for volume "istio-ingressgateway-operator-config" : configmap "istio-ingressgateway-operator-config" not found
  Warning FailedMount 10m (x13 over 20m) kubelet, cat MountVolume.SetUp failed for volume "istio-ingressgateway-operator-token-ckkbc" : secret "istio-ingressgateway-operator-token-ckkbc" not found
  Warning FailedMount 4m54s kubelet, cat Unable to attach or mount volumes: unmounted volumes=[istio-ingressgateway-operator-token-ckkbc istio-ingressgateway-operator-config], unattached volumes=[istio-ingressgateway-operator-token-ckkbc istio-ingressgateway-operator-config charm]: timed out waiting for the condition
  Warning FailedMount 22s (x8 over 18m) kubelet, cat Unable to attach or mount volumes: unmounted volumes=[istio-ingressgateway-operator-config istio-ingressgateway-operator-token-ckkbc], unattached volumes=[istio-ingressgateway-operator-config charm istio-ingressgateway-operator-token-ckkbc]: timed out waiting for the condition

Many of the operator pods don't even get created. This is with 2.8-beta1+develop-028c45c

Tags: k8s
Revision history for this message
Ian Booth (wallyworld) wrote :

Can you provide the bundle?

tags: added: k8s
Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.8-beta1
status: New → Triaged
importance: Undecided → High
assignee: nobody → Harry Pidcock (hpidcock)
Revision history for this message
Harry Pidcock (hpidcock) wrote :

Can we get juju debug-log when this fails please?

Revision history for this message
Kenneth Koski (knkski) wrote :
Revision history for this message
Kenneth Koski (knkski) wrote :
Revision history for this message
Kenneth Koski (knkski) wrote :
Revision history for this message
Kenneth Koski (knkski) wrote :

Attached debug log, juju status and pod yaml. Note that it's stuck where juju status shows it, with 3 active charms, and the rest marked as "waiting".

Revision history for this message
Kenneth Koski (knkski) wrote :

Tried deploying a bundle one application at a time and sleeping for 60 seconds between each one. That worked better, but I still have 2 applications not coming up. If it matters, on one operator pod for an application that's not spinning up the associated workload pod, I'm getting this log item a bunch (though I'm also seeing it on operator pods that successfully spin up workload pods):

2020-03-20 16:22:07 DEBUG update-status 2020-03-20 16:22:07 WARNING juju.juju.series supportedseries.go:692 failed to update distro info: open /usr/share/distro-info/ubuntu.csv: no such file or directory

Revision history for this message
Kenneth Koski (knkski) wrote :

Actually, the few pods that weren't deploying properly with the sleeping were my error, so the workaround of sleeping between deploying pods works. After fixing those errors, still not able to deploy the pods normally (i.e. without the sleeping), so the general issue isn't due to that.

Revision history for this message
George Kraft (cynerva) wrote :

I just encountered this as well. The most interesting thing I'm noticing is an error repeating every 2 min from debug-log on the controller model:

ERROR juju.worker.dependency "caas-operator-provisioner" manifold worker returned unexpected error: failed to generate operator config for "multus": password not found in configuration

Will attach logs for my failure as well, in case it's helpful.

Revision history for this message
George Kraft (cynerva) wrote :
Revision history for this message
George Kraft (cynerva) wrote :
Revision history for this message
George Kraft (cynerva) wrote :
Revision history for this message
George Kraft (cynerva) wrote :
Revision history for this message
George Kraft (cynerva) wrote :

No bundle in my case, I just did `juju deploy -m addons --channel edge cs:~containers/multus`

Revision history for this message
Harry Pidcock (hpidcock) wrote :

Thanks @cynerva for the extra info. It looks like the `ERROR juju.worker.dependency "caas-operator-provisioner" manifold worker returned unexpected error: failed to generate operator config for "multus": password not found in configuration` might be the best thing to follow at this point.

Changed in juju:
status: Triaged → In Progress
Revision history for this message
Harry Pidcock (hpidcock) wrote :

@cynerva and @knkski
Just a confirmation of your workflow, does it look something like this?
- Deploy charm
- Remove application/destroy model
- Deploy charm (in the same named model/namespace)
- Repeat..

Revision history for this message
Harry Pidcock (hpidcock) wrote :

I've been able to repro with
`juju deploy cs:~juju/redis-k8s-1 a`

wait for successful deployment

`juju remove-application a && juju deploy cs:~juju/redis-k8s-1 a || juju deploy cs:~juju/redis-k8s-1 a || juju deploy cs:~juju/redis-k8s-1 a || juju deploy cs:~juju/redis-k8s-1 a`

Revision history for this message
Kenneth Koski (knkski) wrote :

@hpidcock: I'm able to reproduce it without destroying/recreating the model, but the every once in a while that it deploys successfully the first time, I can always reproduce it by destroying/recreating the model and deploying again. And yes, I do quite often destroy/recreate the model, as I am testing deploying a bundle over and over again. My basic workflow looks like this:

Create model
Deploy bundle (the kubeflow bundle is ~30 charms, the istio bundle is ~8 charms)
Figure out what's broken
Tweak the bundle/charms
Destroy model
Goto 1

Revision history for this message
Harry Pidcock (hpidcock) wrote :
Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Harry Pidcock (hpidcock)
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.