Juju won't create service account if role binding already exists

Bug #1845696 reported by Kenneth Koski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Yang Kelvin Liu

Bug Description

I'm attempting to create several service accounts, and they're failing with this error message:

    creating or updating service account: role binding "argo-ui-argo-ui" already exists

Running `microk8s.kubectl get rolebindings -A` shows me that it exists:

    kubeflow argo-ui-argo-ui 61s

I believe that this role binding was created by Juju as part of creating that service account, as opposed to erroring on finding a rolebinding that was manually created before deployment, so the bug seems to be that Juju creates it and then is surprised to see that it exists.

Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.7-beta1
importance: Undecided → High
status: New → Triaged
assignee: nobody → Yang Kelvin Liu (kelvin.liu)
Revision history for this message
Kenneth Koski (knkski) wrote :

It looks like this error message is somewhat misleading, and actually covering up another issue. I wasn't using pod spec v2 correctly, and the CRDs for each service were not getting created, which was causing the workload pods to get constantly rebooted.

However, now that I've created the CRDs manually, I'm still seeing an issue. "juju status" will show a service as active and ready, and then eventually go to the `creating or updating service account: role binding "FOO" already exists` error, and then cycle back to active and ready. The rolebindings appear to be getting constantly deleted and recreated, though I'm not sure what the timings are relative to the error message that I'm getting. An example rolebinding:

Name: pipelines-api-pipeline-runner
Labels: juju-app=pipelines-api
              juju-model=kubeflow
Annotations: <none>
Role:
  Kind: ClusterRole
  Name: pipeline-runner
Subjects:
  Kind Name Namespace
  ---- ---- ---------
  ServiceAccount pipelines-api kubeflow

Revision history for this message
Yang Kelvin Liu (kelvin.liu) wrote :

hi Kenneth,

could you give steps to reproduce this bug?

Revision history for this message
Kenneth Koski (knkski) wrote :

After digging into this some more with Kelvin, it looks like it was related to the use of `clusterRoleNames`, instead of defining the `rules` section in pod spec v2.

However, I just ran into this issue while trying out juju upgrade-charm. The command I ran:

    juju upgrade-charm --resource oci-image=gcr.io/kubeflow-images-public/pytorch-operator:v0.6.0-18-g5e36a57 --path ~/charms/builds/pytorch-operator/ pytorch-operator

That worked in that it deployed a new version, but I immediately started getting the `creating or updating service account: role binding "pytorch-operator-pytorch-operator" already exists` error that I was getting for this bug report. Not sure if it's the same error or not, but that should be able to reproduce it, at least.

Revision history for this message
Yang Kelvin Liu (kelvin.liu) wrote :

The issue is the rolebinding resource is immutable, so Juju has to delete then recreate a new one to "update" an existing rolebinding.
https://github.com/juju/juju/pull/10706 ensures the rolebinding has already deleted from the cluster before starting to create the new one.

Changed in juju:
status: Triaged → In Progress
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.