Update rollback causes all units to restart (Kubernetes)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Medium
|
Unassigned |
Bug Description
When a Kubernetes charm is refreshed to a new version and, in the middle of the upgrade, is refreshed back to the old version: the units that have not yet upgraded are restarted.
This is a regression from vanilla Kubernetes StatefulSets.
Why this matters for database charms: During a rollback, it is expected that the user wants to restore service as quickly as possible. Adding a restart to the rollback process will cause a primary switchover and may cause downtime.
Steps to reproduce (juju 3.1.5, MicroK8s v1.27.5 revision 5892):
juju add-model foo1
juju deploy mysql-router-k8s --channel 8.0/beta -n 3
Set Kubernetes StatefulSet partition so that only the last unit upgrades
kubectl -n foo1 patch statefulset mysql-router-k8s -p '{"spec"
juju refresh mysql-router-k8s --channel 8.0/edge
juju refresh mysql-router-k8s --channel 8.0/beta
Set partition to 0
kubectl -n foo1 patch statefulset mysql-router-k8s -p '{"spec"
Expected behavior:
After last refresh, only mysql-router-k8s/2 restarts
Actual behavior:
After last refresh, all units restart
Difference in logs from initial deployment to final state: https:/
Compare with difference in logs for vanilla Kubernetes StatefulSet: https:/
(Steps to reproduce vanilla k8s available in commit messages)
For reference,
full logs for juju: https:/
full logs for vanilla k8s: https:/
difference in logs for juju in airgapped environment: https:/
at time of bug report, mysql-router-k8s is rev66 on edge and rev64 on beta
tags: | added: canonical-data-platform-eng |
Thank-you for the detailed bug report with the comparison, its super helpful and is exactly as I thought was the cause.
Juju has an internal version number to represent the charm version bumping (see charm-modified- version) , which is monotonically increasing for each refresh to the charm for that application. To ensure the charm and the pod (and its container images) match, we require this charm modified version to be in the pod definition. The downside of this is as you experienced, that even though the charms are exactly the same, the oci images are the same resource, we still run it as an upgrade.
Right now there is no plan to change this behaviour, but my guess is we won't be able to fix this until after juju 4.0