Update rollback causes all units to restart (Kubernetes)

Bug #2036246 reported by Carl Csaposs
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Medium
Unassigned

Bug Description

When a Kubernetes charm is refreshed to a new version and, in the middle of the upgrade, is refreshed back to the old version: the units that have not yet upgraded are restarted.

This is a regression from vanilla Kubernetes StatefulSets.

Why this matters for database charms: During a rollback, it is expected that the user wants to restore service as quickly as possible. Adding a restart to the rollback process will cause a primary switchover and may cause downtime.

Steps to reproduce (juju 3.1.5, MicroK8s v1.27.5 revision 5892):
juju add-model foo1
juju deploy mysql-router-k8s --channel 8.0/beta -n 3
Set Kubernetes StatefulSet partition so that only the last unit upgrades
kubectl -n foo1 patch statefulset mysql-router-k8s -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}'
juju refresh mysql-router-k8s --channel 8.0/edge
juju refresh mysql-router-k8s --channel 8.0/beta
Set partition to 0
kubectl -n foo1 patch statefulset mysql-router-k8s -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":0}}}}'

Expected behavior:
After last refresh, only mysql-router-k8s/2 restarts

Actual behavior:
After last refresh, all units restart

Difference in logs from initial deployment to final state: https://github.com/carlcsaposs-canonical/bug-report-juju-statefulset-rollback/compare/6bb11217df893622572c096666b68d30861b3628...juju-charmhub

Compare with difference in logs for vanilla Kubernetes StatefulSet: https://github.com/carlcsaposs-canonical/bug-report-juju-statefulset-rollback/compare/a378215ad2f042a369f136afacd75483a683f6a9...kubernetes

(Steps to reproduce vanilla k8s available in commit messages)

For reference,
full logs for juju: https://github.com/carlcsaposs-canonical/bug-report-juju-statefulset-rollback/tree/juju-charmhub
full logs for vanilla k8s: https://github.com/carlcsaposs-canonical/bug-report-juju-statefulset-rollback/tree/kubernetes
difference in logs for juju in airgapped environment: https://github.com/carlcsaposs-canonical/bug-report-juju-statefulset-rollback/compare/65f642d244e509f1f315e3434d1a6ec2e72bb961...juju-airgapped
at time of bug report, mysql-router-k8s is rev66 on edge and rev64 on beta

Revision history for this message
Harry Pidcock (hpidcock) wrote :

Thank-you for the detailed bug report with the comparison, its super helpful and is exactly as I thought was the cause.

Juju has an internal version number to represent the charm version bumping (see charm-modified-version), which is monotonically increasing for each refresh to the charm for that application. To ensure the charm and the pod (and its container images) match, we require this charm modified version to be in the pod definition. The downside of this is as you experienced, that even though the charms are exactly the same, the oci images are the same resource, we still run it as an upgrade.

Right now there is no plan to change this behaviour, but my guess is we won't be able to fix this until after juju 4.0

Changed in juju:
importance: Undecided → Medium
status: New → Triaged
tags: added: canonical-data-platform-eng
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.