upgrading Podspec to Sidecar charms fails on AKS
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Medium
|
Harry Pidcock |
Bug Description
Trying out the suggested by `juju refresh` upgrade path for podspec charms to sidecar, the charms get stuck in the following state
```
envoy res:oci-
katib-controller res:oci-
kubeflow-volumes res:oci-
```
unable to spin up new units. For context, `latest/edge` is the new channel. Looking at the pods, it looks like there are still the podspec operator pods up.
* Logs from controller `api-server` container from around that time https:/
* All logs from the same container are attached as well
## Debugging tried
1. `juju scale-application` does nothing (to 0 or 1)
2. Cannot `juju refresh` to previous charm `ERROR cannot downgrade from v2 charm format to v1`
3. Tried to completely remove the charms after them being stuck and re-deploy (so we can follow a possible workaround) twice on different clusters and they all ended up in the same stuck state `unknown 0/1` state with the `operator` pods mentioned above still being there.
4. Restarted the controller (by killing its pod) but this didn't unblock the charms
That means that if this happens, there is no way right now to unblock those charms. (I need to try deleting manually their deployment).
## Reproduce
1. Create AKS cluster 1.29 https:/
2. Deploy kubeflow 1.8/stable https:/
3. Try refreshing those specific charms:
```
juju scale-application katib-controller 0
juju scale-application kubeflow-volumes 0
juju scale-application envoy 0
# wait for units to disappear
juju remove-relation mlmd envoy
juju refresh katib-controller --channel latest/edge --trust
juju refresh kubeflow-volumes --channel latest/edge --trust
juju refresh envoy --channel latest/edge --trust
# wait for refresh to complete
juju scale-application katib-controller 1
juju scale-application kubeflow-volumes 1
juju scale-application envoy 1
```
## Environment
Juju 3.4.4
AKS 1.29
On Microk8s and EKS 1.29, the upgrade path works.
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → Medium |
assignee: | nobody → Harry Pidcock (hpidcock) |
So it's working on microk8s and EKS but not AKS????? Weird.
Ultimately, whomever picks this up will need info from the cluster, not just juju. They'd need get/describe yaml of the affected pods, plus status format yaml of the juju model before and after the upgrade operation.
How feasible it is to redeploy rather than upgrade?