Juju 2.8 error about unit not being the leader

Bug #1875481 reported by Kenneth Koski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Harry Pidcock

Bug Description

I am trying to deploy a charm on Juju 2.8, and I'm getting this error:

2020-04-27 19:24:57 ERROR juju.worker.uniter.context context.go:753 "dex-auth/1" is not the leader but is setting application k8s spec
2020-04-27 19:24:57 ERROR juju-log oidc-client:8: pod-spec-set encountered an error: `ERROR this unit is not the leader`
2020-04-27 19:24:57 ERROR juju-log oidc-client:8: Hook error:
Traceback (most recent call last):
  File "lib/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "lib/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "lib/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "lib/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-dex-auth-1/charm/reactive/dex_auth.py", line 154, in start_charm
    for crd in yaml.safe_load_all(Path("resources/crds.yaml").read_text())
  File "lib/charms/layer/caas_base.py", line 34, in pod_spec_set
    run_hook_command("pod-spec-set", spec)
  File "lib/charms/layer/caas_base.py", line 13, in run_hook_command
    run([cmd], stdout=PIPE, stderr=PIPE, check=True, input=stdin.encode('utf-8'))
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['pod-spec-set']' returned non-zero exit status 1.

In this case, `dex-auth` was not deployed differently from the other charms in the bundle, and ended up as the unit dex-auth/1, whereas the other charms are all charm-name/0. There's only one instance of dex-auth, so not sure how this happened.

Tags: k8s
Revision history for this message
Harry Pidcock (hpidcock) wrote :

Can we please get the full log of when this happened?

Revision history for this message
Yang Kelvin Liu (kelvin.liu) wrote :

Hi Ken,
`pod-spec-set` can be run on leader unit only.
The error is that Juju complained the `dex-auth/1` is the leader but it was setting podspec.
So we have to check is_leader() then run `pod-spec-set`.

Revision history for this message
Yang Kelvin Liu (kelvin.liu) wrote :

Sorry, relayed to quickly.
Could you give a charm to show how to reproduce it?

Revision history for this message
Kenneth Koski (knkski) wrote :
Download full text (9.7 KiB)

I had this happen with two charms for my most recent attempt. The full logs from one of them:

$ kubectl logs --tail 100 -fl juju-operator=katib-controller
2020-04-28 13:28:46 DEBUG juju.worker.introspection socket.go:127 stats worker now serving
2020-04-28 13:28:46 DEBUG juju.worker.dependency engine.go:564 "api-config-watcher" manifold worker started at 2020-04-28 13:28:46.875767974 +0000 UTC
2020-04-28 13:28:46 DEBUG juju.worker.dependency engine.go:564 "migration-fortress" manifold worker started at 2020-04-28 13:28:46.876854252 +0000 UTC
2020-04-28 13:28:46 DEBUG juju.api apiclient.go:1105 successfully dialed "wss://172.31.33.208:17070/model/81f1584d-a77f-4f30-8a12-417f4c2fcedd/api"
2020-04-28 13:28:46 INFO juju.api apiclient.go:637 connection established to "wss://172.31.33.208:17070/model/81f1584d-a77f-4f30-8a12-417f4c2fcedd/api"
2020-04-28 13:28:46 INFO juju.worker.apicaller connect.go:158 [81f158] "application-katib-controller" successfully connected to "172.31.33.208:17070"
2020-04-28 13:28:46 DEBUG juju.api monitor.go:35 RPC connection died
2020-04-28 13:28:46 DEBUG juju.worker.dependency engine.go:584 "api-caller" manifold worker completed successfully
2020-04-28 13:28:46 DEBUG juju.worker.apicaller connect.go:128 connecting with old password
2020-04-28 13:28:47 DEBUG juju.api apiclient.go:1105 successfully dialed "wss://3.87.99.136:17070/model/81f1584d-a77f-4f30-8a12-417f4c2fcedd/api"
2020-04-28 13:28:47 INFO juju.api apiclient.go:637 connection established to "wss://3.87.99.136:17070/model/81f1584d-a77f-4f30-8a12-417f4c2fcedd/api"
2020-04-28 13:28:47 INFO juju.worker.apicaller connect.go:158 [81f158] "application-katib-controller" successfully connected to "3.87.99.136:17070"
2020-04-28 13:28:47 DEBUG juju.worker.dependency engine.go:564 "api-caller" manifold worker started at 2020-04-28 13:28:47.105344894 +0000 UTC
2020-04-28 13:28:47 DEBUG juju.worker.dependency engine.go:564 "upgrader" manifold worker started at 2020-04-28 13:28:47.115656 +0000 UTC
2020-04-28 13:28:47 DEBUG juju.worker.dependency engine.go:564 "migration-minion" manifold worker started at 2020-04-28 13:28:47.115705665 +0000 UTC
2020-04-28 13:28:47 DEBUG juju.worker.dependency engine.go:564 "log-sender" manifold worker started at 2020-04-28 13:28:47.115728209 +0000 UTC
2020-04-28 13:28:47 DEBUG juju.worker.dependency engine.go:564 "upgrade-steps-runner" manifold worker started at 2020-04-28 13:28:47.116695916 +0000 UTC
2020-04-28 13:28:47 DEBUG juju.worker.dependency engine.go:584 "upgrade-steps-runner" manifold worker completed successfully
2020-04-28 13:28:47 DEBUG juju.worker.dependency engine.go:564 "migration-inactive-flag" manifold worker started at 2020-04-28 13:28:47.117828492 +0000 UTC
2020-04-28 13:28:47 INFO juju.worker.caasupgrader upgrader.go:112 abort check blocked until version event received
2020-04-28 13:28:47 DEBUG juju.worker.caasupgrader upgrader.go:127 current agent binary version: 2.8-rc1
2020-04-28 13:28:47 INFO juju.worker.caasupgrader upgrader.go:118 unblocking abort check
2020-04-28 13:28:47 INFO juju.worker.migrationminion worker.go:140 migration phase is now: NONE
2020-04-28 13:28:47 DEBUG juju.worker.dependency engine.go:564 "charm...

Read more...

Revision history for this message
Kenneth Koski (knkski) wrote :

If I'm actively watching the deploy, I can see the multiple units:

$ juju status
...
dex-auth/0* active idle 10.1.19.73 5556/TCP
dex-auth/1 maintenance executing 10.1.17.61 5556/TCP configuring container

Eventually the earlier units will go away, leaving only the later, broken units.

Revision history for this message
Kenneth Koski (knkski) wrote :
Download full text (18.0 KiB)

And here's logs from the first charm that I noticed this for:

$ kubectl logs --tail 1000 -l juju-operator=dex-auth
2020-04-28 13:28:28 INFO juju.cmd supercommand.go:91 running jujud [2.8-rc1 3584 da98e184dd907fe3263b7f098147cd99aba4c73c gc go1.14.2]
2020-04-28 13:28:28 DEBUG juju.cmd supercommand.go:92 args: []string{"/var/lib/juju/tools/jujud", "caasoperator", "--application-name=dex-auth", "--debug"}
2020-04-28 13:28:28 DEBUG juju.agent agent.go:571 read agent config, format "2.0"
2020-04-28 13:28:28 INFO juju.worker.upgradesteps worker.go:70 upgrade steps for 2.8-rc1 have already been run.
2020-04-28 13:28:28 INFO juju.cmd.jujud caasoperator.go:200 caas operator application-dex-auth start (2.8-rc1 [gc])
2020-04-28 13:28:28 DEBUG juju.worker.dependency engine.go:564 "agent" manifold worker started at 2020-04-28 13:28:28.429503399 +0000 UTC
2020-04-28 13:28:28 DEBUG juju.worker.dependency engine.go:564 "upgrade-steps-gate" manifold worker started at 2020-04-28 13:28:28.429686675 +0000 UTC
2020-04-28 13:28:28 DEBUG juju.worker.dependency engine.go:564 "clock" manifold worker started at 2020-04-28 13:28:28.429925853 +0000 UTC
2020-04-28 13:28:28 DEBUG juju.worker.dependency engine.go:564 "api-config-watcher" manifold worker started at 2020-04-28 13:28:28.430132147 +0000 UTC
2020-04-28 13:28:28 DEBUG juju.worker.introspection socket.go:97 introspection worker listening on "@jujud-application-dex-auth"
2020-04-28 13:28:28 DEBUG juju.worker.introspection socket.go:127 stats worker now serving
2020-04-28 13:28:28 DEBUG juju.worker.dependency engine.go:564 "upgrade-steps-flag" manifold worker started at 2020-04-28 13:28:28.439923237 +0000 UTC
2020-04-28 13:28:28 DEBUG juju.worker.apicaller connect.go:128 connecting with old password
2020-04-28 13:28:28 DEBUG juju.worker.dependency engine.go:564 "migration-fortress" manifold worker started at 2020-04-28 13:28:28.451176989 +0000 UTC
2020-04-28 13:28:28 DEBUG juju.api apiclient.go:1105 successfully dialed "wss://172.31.33.208:17070/model/81f1584d-a77f-4f30-8a12-417f4c2fcedd/api"
2020-04-28 13:28:28 INFO juju.api apiclient.go:637 connection established to "wss://172.31.33.208:17070/model/81f1584d-a77f-4f30-8a12-417f4c2fcedd/api"
2020-04-28 13:28:28 INFO juju.worker.apicaller connect.go:158 [81f158] "application-dex-auth" successfully connected to "172.31.33.208:17070"
2020-04-28 13:28:28 DEBUG juju.worker.dependency engine.go:564 "api-caller" manifold worker started at 2020-04-28 13:28:28.655837161 +0000 UTC
2020-04-28 13:28:28 DEBUG juju.worker.dependency engine.go:564 "migration-minion" manifold worker started at 2020-04-28 13:28:28.666100125 +0000 UTC
2020-04-28 13:28:28 DEBUG juju.worker.dependency engine.go:564 "log-sender" manifold worker started at 2020-04-28 13:28:28.666290424 +0000 UTC
2020-04-28 13:28:28 DEBUG juju.worker.dependency engine.go:564 "upgrader" manifold worker started at 2020-04-28 13:28:28.666330008 +0000 UTC
2020-04-28 13:28:28 DEBUG juju.worker.dependency engine.go:564 "upgrade-steps-runner" manifold worker started at 2020-04-28 13:28:28.667080449 +0000 UTC
2020-04-28 13:28:28 DEBUG juju.worker.dependency engine.go:584 "upgrade-steps-runner" manifold worker completed succes...

Revision history for this message
Ian Booth (wallyworld) wrote :

As an aside, one issue Juju currently has is that if the leader unit does away, there is not an immediate election of a new leader - the leadership lease needs to time out. This can take up to a minute. So that could explain the symptom. See bug 1469731

It comes down to: was there > 1 unit asked for, and why was unit 0 removed. If possible, until bug 1469731 is fixed, it's best to try and avoid removing the leader unit.

Revision history for this message
Kenneth Koski (knkski) wrote :

I tried deploying the latest stable bundle from the charm store instead of building the charms locally and was able to reproduce the issue.

Revision history for this message
Kenneth Koski (knkski) wrote :

I haven't intentionally used the >1 unit functionality for the Kubeflow charms. I don't think I've accidentally enabled it given that I'm able to reproduce with the stable bundle from the charm store, either.

Revision history for this message
Kenneth Koski (knkski) wrote :

Looks like it's spinning up new units quite frequently:

dex-auth/60* error idle 10.1.17.121 5556/TCP hook failed: "oidc-client-relation-joined"

Revision history for this message
Ian Booth (wallyworld) wrote :

In a deployment, if a pod is terminated and recreated, this will show up as a new unit in Juju, since it is a different pod with a new UUID. If that is being done in error, then the root cause of the error will need to be addressed. A hook error should not result in the pod being restarted though.
All that notwithstanding, if a deployment pod fr which the Juju unit is the leader does legitimately get restarted, causing a new unit to show up in Juju, then due to bug 1469731, leadership is not immediately transferred.

Ian Booth (wallyworld)
tags: added: k8s
Revision history for this message
Ian Booth (wallyworld) wrote :

TL;DR; a quick win is to fix the charms to do a is_leader() check before doing leader only calls.

So there's a few things here.

In trying to reproduce on microk8s, I've had it work many times and fail a few times. The charms which have failed have been dex-auth and katib-controller. One thing to note about the charms is that start_charm() in dex-auth does not appear to have an is_leader() check. This check in needed in *all* charms that need to use leader only api calls. So this needs to be fixed in any of the charms in the bundle that don't do that check.

One way is which juju could trigger a pod bounce is in how it does write out the deployment yaml - it's not reading the existing yaml and updating, so the result is a new replicaset which means a bounce of the pod(s). That needs fixing in Juju, but doesn't appear to be the issue here.

When the issue has been observed, extra logging added to juju appears to show that juju is creating the deployment with scale 1 and correctly leaving it alone after that. Something else is causing the pod to bounce and this triggers the cycle of new pod -> new unit -> start_charm() -> error not leader. Fixing bug 1469731 will mask the issue somewhat.

Adding to 2.8 milestone to track the work to improve how juju creates deployments.

Changed in juju:
milestone: none → 2.8-rc1
importance: Undecided → High
status: New → Triaged
Revision history for this message
Kenneth Koski (knkski) wrote :
Download full text (21.0 KiB)

I updated the dex-auth charm to check whether or not it's a leader before calling pod spec set, and that stopped the repeated bouncing of pods, but end up with the dex-auth/1 unit that's unable to do any work. Here are the logs:

20-05-08 23:18:02 INFO juju.cmd supercommand.go:91 running jujud [2.9-beta1 3641 493556091fc053367946e6aada9c863982377a94 gc go1.14.2]
2020-05-08 23:18:02 DEBUG juju.cmd supercommand.go:92 args: []string{"/var/lib/juju/tools/jujud", "caasoperator", "--application-name=dex-auth", "--debug"}
2020-05-08 23:18:02 DEBUG juju.agent agent.go:571 read agent config, format "2.0"
2020-05-08 23:18:02 INFO juju.worker.upgradesteps worker.go:70 upgrade steps for 2.9-beta1 have already been run.
2020-05-08 23:18:02 INFO juju.cmd.jujud caasoperator.go:200 caas operator application-dex-auth start (2.9-beta1 [gc])
2020-05-08 23:18:02 DEBUG juju.worker.dependency engine.go:564 "upgrade-steps-gate" manifold worker started at 2020-05-08 23:18:02.906930387 +0000 UTC
2020-05-08 23:18:02 DEBUG juju.worker.dependency engine.go:564 "agent" manifold worker started at 2020-05-08 23:18:02.907023013 +0000 UTC
2020-05-08 23:18:02 DEBUG juju.worker.apicaller connect.go:128 connecting with old password
2020-05-08 23:18:02 DEBUG juju.worker.dependency engine.go:564 "clock" manifold worker started at 2020-05-08 23:18:02.90718082 +0000 UTC
2020-05-08 23:18:02 DEBUG juju.worker.dependency engine.go:564 "upgrade-steps-flag" manifold worker started at 2020-05-08 23:18:02.908256142 +0000 UTC
2020-05-08 23:18:02 DEBUG juju.worker.introspection socket.go:97 introspection worker listening on "@jujud-application-dex-auth"
2020-05-08 23:18:02 DEBUG juju.worker.introspection socket.go:127 stats worker now serving
2020-05-08 23:18:02 DEBUG juju.worker.dependency engine.go:564 "api-config-watcher" manifold worker started at 2020-05-08 23:18:02.917148121 +0000 UTC
2020-05-08 23:18:02 DEBUG juju.worker.dependency engine.go:564 "migration-fortress" manifold worker started at 2020-05-08 23:18:02.918394872 +0000 UTC
2020-05-08 23:18:02 DEBUG juju.api apiclient.go:1105 successfully dialed "wss://34.229.248.19:17070/model/34904a0c-8909-4a13-85eb-6d35dd1535f8/api"
2020-05-08 23:18:02 INFO juju.api apiclient.go:637 connection established to "wss://34.229.248.19:17070/model/34904a0c-8909-4a13-85eb-6d35dd1535f8/api"
2020-05-08 23:18:02 INFO juju.worker.apicaller connect.go:158 [34904a] "application-dex-auth" successfully connected to "34.229.248.19:17070"
2020-05-08 23:18:02 DEBUG juju.api monitor.go:35 RPC connection died
2020-05-08 23:18:02 DEBUG juju.worker.dependency engine.go:584 "api-caller" manifold worker completed successfully
2020-05-08 23:18:02 DEBUG juju.worker.apicaller connect.go:128 connecting with old password
2020-05-08 23:18:03 DEBUG juju.api apiclient.go:1105 successfully dialed "wss://172.31.36.124:17070/model/34904a0c-8909-4a13-85eb-6d35dd1535f8/api"
2020-05-08 23:18:03 INFO juju.api apiclient.go:637 connection established to "wss://172.31.36.124:17070/model/34904a0c-8909-4a13-85eb-6d35dd1535f8/api"
2020-05-08 23:18:03 INFO juju.worker.apicaller connect.go:158 [34904a] "application-dex-auth" successfully connected to "172.31.36.124:17070"
2020-05-08 2...

Revision history for this message
Ian Booth (wallyworld) wrote :

Yeah, right now, sadly, unit 1 will not be able to act as the leader until the lease from unit 1 times out. This is bug 1469731. If unit 1 does start, I'm thinking unit 0 would already have run start_charm() to get things set up?

Changed in juju:
milestone: 2.8-rc1 → none
Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.8-rc2
Revision history for this message
Kenneth Koski (knkski) wrote :

Unit 0 will have run start_charm, unfortunately that means it will have been started up with incomplete configuration, and will not be running with data received from the relationship. Unit 1 could update the deployment if it could read relationship data.

Ian Booth (wallyworld)
Changed in juju:
milestone: 2.8-rc2 → 2.8.1
Revision history for this message
Tim Penhey (thumper) wrote :

We think this has now been fixed due to leadership being revoked on unit removal. I'll mark the bug incomplete and if it is noticed on 2.8 again, please reopen with juju version and logs.

Changed in juju:
status: Triaged → Incomplete
milestone: 2.8.1 → none
Revision history for this message
Kenneth Koski (knkski) wrote :

I'm running into this with 2.8.0. I added a check to all of the Kubeflow charms that looks like this:

if not hookenv.is_leader():
    layer.status.blocked("this unit is not a leader")
    return False

That left me with many copies of some charms:

argo-controller/0* active idle 10.1.48.187
argo-controller/1 blocked idle 10.1.90.124 this unit is not a leader
argo-controller/2 blocked idle 10.1.90.127 this unit is not a leader
argo-controller/3 blocked idle 10.1.90.128 this unit is not a leader
argo-controller/4 error idle 10.1.90.126 Started container tensorflow-serve
argo-controller/5 blocked idle 10.1.90.125 this unit is not a leader
argo-controller/6 blocked idle 10.1.90.132 this unit is not a leader
argo-controller/7 blocked idle 10.1.90.130 this unit is not a leader
argo-controller/8 blocked idle 10.1.90.133 this unit is not a leader

Other charms worked fine, though. I'm not really sure what is triggering this behavior

Revision history for this message
Ian Booth (wallyworld) wrote :

argo-controller/0 is the leader (as indicated by the *)

The other argo-controller units are not the leader.

So from that perspective, Juju is correctly only allowing one unit to be the leader.

Is the question why there are 8 argo-contoller units? Are the 8 corresponding pods?

Revision history for this message
Kenneth Koski (knkski) wrote :

It looks like there's only one pod for argo-controller, but those extra units aren't going away, which seems wrong.

Also, the dex-auth charm is getting constantly recycled:

dex-auth/1880* terminated executing 10.1.90.162 5556/TCP (stop) unit stopped by the cloud
dex-auth/1881 blocked idle 10.1.19.92 5556/TCP this unit is not a leader

Revision history for this message
Kenneth Koski (knkski) wrote :

I think this points at the underlying issue:

    application-dex-auth: 12:24:02 ERROR juju.worker.caasoperator exited "dex-auth/1": executing operation "remote init": caas-unit-init for unit "dex-auth/6" with command: "/var/lib/juju/tools/jujud caas-unit-init --unit unit-dex-auth-6 --charm-dir /tmp/unit-dex-auth-6215732929/charm --upgrade" failed: sh: /var/lib/juju/tools/jujud: not found

I exec'ed into the dex-auth pod, and that file exists. However, trying to run it similarly fails with the "not found" error from sh.

After poking around a bit, I believe this is due to Alpine linux using musl, and Ubuntu using glibc. I ran "ldd /var/lib/juju/tools/jujud" and got this line:

    /lib64/ld-linux-x86-64.so.2 (0x7f3ba0a74000)

That file doesn't exist on alpine linux, instead there's a /lib/ld-musl-x86_64.so.1 file.

I was able to install a compatibility layer with "apk add libc6-compat" and it got further, but still errored out:

    # ./jujud
    Error relocating ./jujud: __vfprintf_chk: symbol not found
    Error relocating ./jujud: __fprintf_chk: symbol not found

Which is the same error messages that ldd prints out. This is probably due to different versions of libc in play.

Harry Pidcock (hpidcock)
Changed in juju:
status: Incomplete → In Progress
assignee: nobody → Harry Pidcock (hpidcock)
milestone: none → 2.8.1
Revision history for this message
Harry Pidcock (hpidcock) wrote :
Harry Pidcock (hpidcock)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
Revision history for this message
Kenneth Koski (knkski) wrote :

It looks like the above error about jujud not being found is not the underlying cause of this issue. I tried deploying kubeflow with juju 2.8.1, and got the same behavior of getting errors about the unit not being a leader, without the error about jujud not being found.

It seems to be a flaky issue, as I was able to deploy kubeflow once successfully, but then wasn't able to a second time. The script broke due to juju-wait seeing a unit go into an error state, because of the error about not being a leader.

I'm able to reproduce this in CI, see here:

https://github.com/juju-solutions/bundle-kubeflow/pull/225/checks?check_run_id=870323388

Juju 2.8.1 errors out in those jobs due to this issue.

Revision history for this message
Kenneth Koski (knkski) wrote :
Download full text (26.3 KiB)

I'm getting these errors from the operator pod of dex-auth (which is having these issues):

2020-07-14 22:38:17 ERROR juju.worker.uniter agent.go:31 resolver loop error: executing operation "remote init": caas-unit-init for unit "dex-auth/1" with command: "/var/lib/juju/tools/jujud caas-unit-init --unit unit-dex-auth-1 --charm-dir /tmp/unit-dex-auth-1863151550/charm --upgrade" failed: : command terminated with exit code 1
2020-07-14 22:38:17 ERROR juju.worker.caasoperator runner.go:430 exited "dex-auth/1": executing operation "remote init": caas-unit-init for unit "dex-auth/1" with command: "/var/lib/juju/tools/jujud caas-unit-init --unit unit-dex-auth-1 --charm-dir /tmp/unit-dex-auth-1863151550/charm --upgrade" failed: : command terminated with exit code 1
2020-07-14 22:38:21 ERROR juju.worker.uniter agent.go:31 resolver loop error: executing operation "remote init": caas-unit-init for unit "dex-auth/1" with command: "/var/lib/juju/tools/jujud caas-unit-init --unit unit-dex-auth-1 --charm-dir /tmp/unit-dex-auth-1414093216/charm --upgrade" failed: : command terminated with exit code 1
2020-07-14 22:38:21 ERROR juju.worker.caasoperator runner.go:430 exited "dex-auth/1": executing operation "remote init": caas-unit-init for unit "dex-auth/1" with command: "/var/lib/juju/tools/jujud caas-unit-init --unit unit-dex-auth-1 --charm-dir /tmp/unit-dex-auth-1414093216/charm --upgrade" failed: : command terminated with exit code 1
2020-07-14 22:38:25 ERROR juju.worker.uniter agent.go:31 resolver loop error: executing operation "remote init": caas-unit-init for unit "dex-auth/1" with command: "/var/lib/juju/tools/jujud caas-unit-init --unit unit-dex-auth-1 --charm-dir /tmp/unit-dex-auth-1422214098/charm --upgrade" failed: : command terminated with exit code 1
2020-07-14 22:38:25 ERROR juju.worker.caasoperator runner.go:430 exited "dex-auth/1": executing operation "remote init": caas-unit-init for unit "dex-auth/1" with command: "/var/lib/juju/tools/jujud caas-unit-init --unit unit-dex-auth-1 --charm-dir /tmp/unit-dex-auth-1422214098/charm --upgrade" failed: : command terminated with exit code 1
2020-07-14 22:38:28 ERROR juju.worker.uniter agent.go:31 resolver loop error: executing operation "remote init": caas-unit-init for unit "dex-auth/1" with command: "/var/lib/juju/tools/jujud caas-unit-init --unit unit-dex-auth-1 --charm-dir /tmp/unit-dex-auth-1886864084/charm --upgrade" failed: : command terminated with exit code 1
2020-07-14 22:38:28 ERROR juju.worker.caasoperator runner.go:430 exited "dex-auth/1": executing operation "remote init": caas-unit-init for unit "dex-auth/1" with command: "/var/lib/juju/tools/jujud caas-unit-init --unit unit-dex-auth-1 --charm-dir /tmp/unit-dex-auth-1886864084/charm --upgrade" failed: : command terminated with exit code 1
2020-07-14 22:38:32 ERROR juju.worker.uniter agent.go:31 resolver loop error: executing operation "remote init": caas-unit-init for unit "dex-auth/1" with command: "/var/lib/juju/tools/jujud caas-unit-init --unit unit-dex-auth-1 --charm-dir /tmp/unit-dex-auth-1909562150/charm --upgrade" failed: ERROR failed to remove unit tools dir /var/lib/juju/tools/unit-dex-auth-1: unlinkat /var/lib/juju/tools/unit...

Revision history for this message
Harry Pidcock (hpidcock) wrote :

What is the user of the container? I'm guessing it's not root. Which is probably the issue with regards to this.

Revision history for this message
Kenneth Koski (knkski) wrote :

Yeah, the container is running as id 1001:

https://github.com/dexidp/dex/blob/master/Dockerfile#L16

This is a popular method of running containers, so it definitely seems like something we should support.

Revision history for this message
Kenneth Koski (knkski) wrote :

I've narrowed down the issue to a small, reproducible test case:

juju add-model kubeflow --config update-status-hook-interval=30s
juju deploy cs:~kubeflow-charmers/dex-auth-53
juju deploy cs:~kubeflow-charmers/oidc-gatekeeper-53
juju relate dex-auth oidc-gatekeeper
juju config oidc-gatekeeper client-secret=password
juju config dex-auth static-username=admin static-password=password
juju wait -wv
juju config dex-auth public-url=localhost
juju config oidc-gatekeeper public-url=localhost

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.