Charm code upgraded without `upgrade-charm` event if refresh while unit in error status

Bug #2068500 reported by Carl Csaposs
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Alastair Flynn

Bug Description

If a unit is in error status (charm raised uncaught exception) before the user runs `juju refresh`, that unit will not get an upgrade-charm event, but its code will be upgraded.

Impact: if the user `juju refresh`es to a charm revision that contains a bug or if they accidentally selected the wrong revision, they will not be able to rollback safely

Steps to reproduce:
1. git clone https://github.com/carlcsaposs-canonical/bug-report-juju-upgrade-error-state-missing-event-vm
2. cd bug-report-juju-upgrade-error-state-missing-event-vm
3. charmcraft pack
4. On LXD controller, juju add-model foo1
5. juju deploy ./foo_ubuntu-22.04-amd64.charm -n 3
6. juju ssh foo/1
7. Edit /var/lib/juju/agents/unit-foo-1/charm/src/charm.py, uncomment `raise Exception` in `_on_update_status` function
8. (exit from unit ssh)
9. jhack ffwd
  - or wait for update-status
10. Unit 1 should be in error status
11. touch src/bar
12. charmcraft pack
13. juju refresh foo --path ./foo_ubuntu-22.04-amd64.charm
14. Wait a few minutes for foo/1 update-status to retry & exit error status
15. ls /var/lib/juju/agents/unit-foo-1/charm/src/ shows `bar` file—charm code was upgraded
16. Look at juju debug-log

versions:
juju agent: 3.4.2
```
$ snap list
Name Version Rev Tracking Publisher Notes
charmcraft 2.6.0 3451 latest/stable canonical✓ classic
core18 20240416 2823 latest/stable canonical✓ base
core20 20240416 2318 latest/stable canonical✓ base
core22 20240408 1380 latest/stable canonical✓ base
google-cloud-cli 478.0.0 243 latest/stable/… google-cloud-sdk✓ classic
juju 3.4.2 26968 3.4/stable canonical✓ -
lxd 5.0.3-d921d2e 28373 5.0/stable/… canonical✓ -
microk8s v1.29.5 6884 1.29-strict/stable canonical✓ -
snapd 2.63 21759 latest/stable canonical✓ snapd
```

Expected behavior:
After juju stops retrying the failed event on foo/1, it upgrades the charm code and emits an upgrade-charm event. Then it emits a config-changed event and then a leader-settings-changed event, as documented in https://juju.is/docs/sdk/upgrade-charm-event#heading--emission-sequence

Actual behavior:
The charm is upgraded but no upgrade-charm event is emitted. Then a leader-settings-changed event is emitted and then a config-changed event is emitted.

Juju debug-log after `juju refresh` shows:
```
machine-2: 14:26:02 INFO juju.downloader downloading from local:jammy/foo-1
unit-foo-2: 14:26:02 INFO juju.worker.uniter.charm downloading local:jammy/foo-1 from API server
machine-0: 14:26:02 INFO juju.downloader downloading from local:jammy/foo-1
unit-foo-0: 14:26:02 INFO juju.worker.uniter.charm downloading local:jammy/foo-1 from API server
machine-0: 14:26:02 INFO juju.downloader download complete ("local:jammy/foo-1")
machine-2: 14:26:02 INFO juju.downloader download complete ("local:jammy/foo-1")
machine-0: 14:26:02 INFO juju.downloader download verified ("local:jammy/foo-1")
machine-2: 14:26:02 INFO juju.downloader download verified ("local:jammy/foo-1")
unit-foo-0: 14:26:05 INFO juju.worker.uniter found queued "upgrade-charm" hook
unit-foo-2: 14:26:05 INFO juju.worker.uniter found queued "upgrade-charm" hook
unit-foo-0: 14:26:05 DEBUG unit.foo/0.juju-log ops 2.14.0 up and running.
unit-foo-0: 14:26:05 INFO unit.foo/0.juju-log Running legacy hooks/upgrade-charm.
unit-foo-2: 14:26:05 DEBUG unit.foo/2.juju-log ops 2.14.0 up and running.
unit-foo-2: 14:26:05 INFO unit.foo/2.juju-log Running legacy hooks/upgrade-charm.
unit-foo-0: 14:26:05 DEBUG unit.foo/0.juju-log ops 2.14.0 up and running.
unit-foo-0: 14:26:05 DEBUG unit.foo/0.juju-log Charm called itself via hooks/upgrade-charm.
unit-foo-2: 14:26:05 DEBUG unit.foo/2.juju-log ops 2.14.0 up and running.
unit-foo-0: 14:26:05 DEBUG unit.foo/0.juju-log Legacy hooks/upgrade-charm exited with status 0.
unit-foo-2: 14:26:05 DEBUG unit.foo/2.juju-log Charm called itself via hooks/upgrade-charm.
unit-foo-0: 14:26:05 DEBUG unit.foo/0.juju-log Emitting Juju event upgrade_charm.
unit-foo-0: 14:26:05 INFO unit.foo/0.juju-log Event: <class 'ops.charm.UpgradeCharmEvent'>
unit-foo-2: 14:26:05 DEBUG unit.foo/2.juju-log Legacy hooks/upgrade-charm exited with status 0.
unit-foo-2: 14:26:05 DEBUG unit.foo/2.juju-log Emitting Juju event upgrade_charm.
unit-foo-2: 14:26:05 INFO unit.foo/2.juju-log Event: <class 'ops.charm.UpgradeCharmEvent'>
unit-foo-0: 14:26:05 INFO juju.worker.uniter.operation ran "upgrade-charm" hook (via hook dispatching script: dispatch)
unit-foo-0: 14:26:05 INFO juju.worker.uniter found queued "config-changed" hook
unit-foo-2: 14:26:05 INFO juju.worker.uniter.operation ran "upgrade-charm" hook (via hook dispatching script: dispatch)
unit-foo-2: 14:26:05 INFO juju.worker.uniter found queued "config-changed" hook
unit-foo-0: 14:26:06 DEBUG unit.foo/0.juju-log ops 2.14.0 up and running.
unit-foo-0: 14:26:06 DEBUG unit.foo/0.juju-log Emitting Juju event config_changed.
unit-foo-0: 14:26:06 INFO unit.foo/0.juju-log Event: <class 'ops.charm.ConfigChangedEvent'>
unit-foo-2: 14:26:06 DEBUG unit.foo/2.juju-log ops 2.14.0 up and running.
unit-foo-2: 14:26:06 DEBUG unit.foo/2.juju-log Emitting Juju event config_changed.
unit-foo-2: 14:26:06 INFO unit.foo/2.juju-log Event: <class 'ops.charm.ConfigChangedEvent'>
unit-foo-0: 14:26:06 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)
unit-foo-2: 14:26:06 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)
unit-foo-0: 14:26:06 DEBUG unit.foo/0.juju-log ops 2.14.0 up and running.
unit-foo-0: 14:26:06 DEBUG unit.foo/0.juju-log Emitting Juju event leader_settings_changed.
unit-foo-0: 14:26:06 INFO unit.foo/0.juju-log Event: <class 'ops.charm.LeaderSettingsChangedEvent'>
unit-foo-0: 14:26:06 INFO juju.worker.uniter.operation ran "leader-settings-changed" hook (via hook dispatching script: dispatch)
unit-foo-1: 14:26:24 INFO juju.worker.uniter awaiting error resolution for "update-status" hook
unit-foo-1: 14:26:24 DEBUG unit.foo/1.juju-log ops 2.14.0 up and running.
unit-foo-1: 14:26:24 DEBUG unit.foo/1.juju-log Emitting Juju event update_status.
unit-foo-1: 14:26:24 INFO unit.foo/1.juju-log Event: <class 'ops.charm.UpdateStatusEvent'>
unit-foo-1: 14:26:24 ERROR unit.foo/1.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-foo-1/charm/./src/charm.py", line 46, in <module>
    ops.main(FooCharm) # type: ignore
  File "/var/lib/juju/agents/unit-foo-1/charm/venv/ops/main.py", line 560, in __call__
    return main(charm_class, use_juju_for_storage=use_juju_for_storage)
  File "/var/lib/juju/agents/unit-foo-1/charm/venv/ops/main.py", line 548, in main
    manager.run()
  File "/var/lib/juju/agents/unit-foo-1/charm/venv/ops/main.py", line 527, in run
    self._emit()
  File "/var/lib/juju/agents/unit-foo-1/charm/venv/ops/main.py", line 516, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-foo-1/charm/venv/ops/main.py", line 147, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-foo-1/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-foo-1/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-foo-1/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-foo-1/charm/./src/charm.py", line 43, in _on_update_status
    raise Exception
Exception
unit-foo-1: 14:26:24 ERROR juju.worker.uniter.operation hook "update-status" (via hook dispatching script: dispatch) failed: exit status 1
unit-foo-1: 14:26:24 INFO juju.worker.uniter.charm downloading local:jammy/foo-1 from API server
machine-1: 14:26:24 INFO juju.downloader downloading from local:jammy/foo-1
machine-1: 14:26:24 INFO juju.downloader download complete ("local:jammy/foo-1")
machine-1: 14:26:24 INFO juju.downloader download verified ("local:jammy/foo-1")
unit-foo-1: 14:26:27 INFO juju.worker.uniter awaiting error resolution for "update-status" hook
unit-foo-1: 14:27:48 INFO juju.worker.uniter awaiting error resolution for "update-status" hook
unit-foo-1: 14:27:48 DEBUG unit.foo/1.juju-log ops 2.14.0 up and running.
unit-foo-1: 14:27:48 DEBUG unit.foo/1.juju-log Emitting Juju event update_status.
unit-foo-1: 14:27:48 INFO unit.foo/1.juju-log Event: <class 'ops.charm.UpdateStatusEvent'>
unit-foo-1: 14:27:48 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-foo-1: 14:27:49 DEBUG unit.foo/1.juju-log ops 2.14.0 up and running.
unit-foo-1: 14:27:49 DEBUG unit.foo/1.juju-log Emitting Juju event leader_settings_changed.
unit-foo-1: 14:27:49 INFO unit.foo/1.juju-log Event: <class 'ops.charm.LeaderSettingsChangedEvent'>
unit-foo-1: 14:27:49 INFO juju.worker.uniter.operation ran "leader-settings-changed" hook (via hook dispatching script: dispatch)
unit-foo-1: 14:27:49 DEBUG unit.foo/1.juju-log ops 2.14.0 up and running.
unit-foo-1: 14:27:49 DEBUG unit.foo/1.juju-log Emitting Juju event config_changed.
unit-foo-1: 14:27:49 INFO unit.foo/1.juju-log Event: <class 'ops.charm.ConfigChangedEvent'>
unit-foo-1: 14:27:49 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)
```

Note that unit 1 does not receive upgrade-charm event, and that leader-settings-changed event fired before config-changed event

Revision history for this message
Carl Csaposs (carlcsaposs) wrote :
Revision history for this message
Joseph Phillips (manadart) wrote :

Looks like we can add some debugging to worker/uniter/resolver/loop.go that will help diagnose what's going on here.

Changed in juju:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Alastair Flynn (aflynn50)
milestone: none → 3.4.5
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.