Blocked Status disappears after unrelated charm action

Bug #2077189 reported by Adam Dyess
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
Fix Released
High
Adam Dyess
Kubernetes Worker Charm
Fix Released
High
Adam Dyess

Bug Description

1.29, 1.30, and 1.31 charm can block for multiple reasons during the reconciler's loop, but only one status message can bubble out to the unit.

In the following pastebin https://pastebin.canonical.com/p/McTrYdhgs5/ one can observe:

```
unit-kubernetes-control-plane-0: 15:25:49 INFO unit.kubernetes-control-plane/0.juju-log peer:27: Status context closed with: [BlockedStatus('Needs manual upgrade, run the upgrade action.'), BlockedStatus('ceph-client relation is no longer managed -- see debug log')]
unit-kubernetes-control-plane-0: 15:25:50 INFO juju.worker.uniter.operation ran "peer-relation-changed" hook (via hook dispatching script: dispatch)
...
unit-kubernetes-control-plane-0: 16:21:01 INFO unit.kubernetes-control-plane/0.juju-log Starting the upgrade of Kubernetes snaps to '1.29/stable' channel.
unit-kubernetes-control-plane-0: 16:22:39 INFO unit.kubernetes-control-plane/0.juju-log Successfully upgraded Kubernetes snaps to the '1.29/stable' channel.
unit-kubernetes-control-plane-0: 16:22:39 INFO unit.kubernetes-control-plane/0.juju-log Status context closed with: []
```

Running the upgrade action caused the ceph-client blocked message to be dropped.

The reconciler shouldn't drop blocked status messages like these

Adam Dyess (addyess)
Changed in charm-kubernetes-worker:
milestone: none → 1.31+ck1
Changed in charm-kubernetes-master:
milestone: none → 1.31+ck1
status: New → Confirmed
Changed in charm-kubernetes-worker:
status: New → Confirmed
Changed in charm-kubernetes-master:
importance: Undecided → Medium
Changed in charm-kubernetes-worker:
importance: Undecided → High
Changed in charm-kubernetes-master:
importance: Medium → High
importance: High → Medium
Changed in charm-kubernetes-worker:
importance: High → Medium
tags: added: backport-needed
Revision history for this message
Adam Dyess (addyess) wrote :

This likely only occurs when the charm unit is RECONCILED yet has a blocked or waiting status.

If the charm wasn't reconciled -- an update-status event would force the charm back through the reconciler
However, because it's reconciled, only actionable juju events (relation-*, config-changed, etc.) would force us back through the reconciler.

Action events aren't reconciled -- and therefore they shouldn't really run inside a `status.context`

Revision history for this message
Adam Dyess (addyess) wrote (last edit ):

I believe the following actions will likely wipe the blocked messages off the list:
* upgrade (kubernetes-worker and kubernetes-control-plane)
* restart (kubernetes-control-plane only)

I can see no reason these two actions should use a context. The context does NOTHING but wipe the charm's unit status -- and force it to ActiveStatus no matter the result of the action. This is not the intent of the action at all

Revision history for this message
Adam Dyess (addyess) wrote :

This becomes really noticable in situations where there are deprecation warnings that block the charm, then go missing after the action is run:

the easiest reproducer is the following

* Start with the unit active/idle
* Next Configure the charm so it blocks
$ juju config kubernetes-control-plane enable-nvidia-plugin=true
* The charm should become blocked with the message:
> nvidia-plugin is no longer managed -- see debug log
* Run the charm action to restart the services:
$ juju run kubernetes-control-plane/0 restart
* After which the charm should be active/idle

Revision history for this message
Adam Dyess (addyess) wrote :
Changed in charm-kubernetes-master:
importance: Medium → High
Changed in charm-kubernetes-worker:
importance: Medium → High
Changed in charm-kubernetes-master:
status: Confirmed → In Progress
Revision history for this message
Adam Dyess (addyess) wrote :
Changed in charm-kubernetes-worker:
status: Confirmed → In Progress
Changed in charm-kubernetes-master:
milestone: 1.31+ck1 → 1.31
Changed in charm-kubernetes-worker:
milestone: 1.31+ck1 → 1.31
Revision history for this message
Adam Dyess (addyess) wrote :

Addresses library -- not necessary for resolving this issue, but found as a part of this diagnosis
https://github.com/charmed-kubernetes/charm-lib-kubernetes-snaps/pull/27

Adam Dyess (addyess)
Changed in charm-kubernetes-worker:
assignee: nobody → Adam Dyess (addyess)
Changed in charm-kubernetes-master:
assignee: nobody → Adam Dyess (addyess)
Revision history for this message
Adam Dyess (addyess) wrote :
Changed in charm-kubernetes-worker:
status: In Progress → Fix Committed
Changed in charm-kubernetes-master:
status: In Progress → Fix Committed
Adam Dyess (addyess)
Changed in charm-kubernetes-master:
status: Fix Committed → Fix Released
Changed in charm-kubernetes-worker:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.