juju unit agent starts trying to install before the charm is downloaded
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Medium
|
Heather Lanigan |
Bug Description
I tried doing a `juju deploy --to 0 postgresql` and it seems that the Machine agent notices and starts a new Unit agent before the charm has even finished downloading such that we know what content it is going to be running.
Specifically:
```
2023-10-26 15:18:13 INFO juju unit_agent.go:289 Starting unit workers for "postgresql/0"
2023-10-26 15:18:13 INFO juju.worker.
2023-10-26 15:18:13 INFO juju.worker.
2023-10-26 15:18:13 INFO juju.worker.
2023-10-26 15:18:13 INFO juju.worker.
2023-10-26 15:18:13 INFO juju.worker.
2023-10-26 15:18:13 INFO juju.worker.logger logger.go:120 logger worker started
2023-10-26 15:18:13 ERROR juju.worker.
2023-10-26 15:18:13 INFO juju.worker.uniter uniter.go:363 unit "postgresql/0" started
2023-10-26 15:18:13 INFO juju.worker.uniter uniter.go:689 resuming charm install
2023-10-26 15:18:13 INFO juju.worker.
2023-10-26 15:18:13 INFO juju.worker.uniter uniter.go:347 unit "postgresql/0" shutting down: preparing operation "install ch:amd64/
2023-10-26 15:18:13 ERROR juju.worker.
2023-10-26 15:18:17 INFO juju.worker.uniter uniter.go:363 unit "postgresql/0" started
2023-10-26 15:18:17 INFO juju.worker.uniter uniter.go:689 resuming charm install
2023-10-26 15:18:17 INFO juju.worker.
```
I think we don't typically notice this because usually the machine takes long enough to get started that the download is complete by the time the agent is looking to get the charm. But it does seem like we're basing 'do I have a unit that I should start', on the wrong bit of data if we are trying to do so before the controller actually has that information, and we are depending on the ERROR and restart of the Unit agent worker to bounce long enough until the charm has finished downloading.
Note that this isn't high priority, because we do eventually bounce until the controller has finished downloading, and then we progress normally.
Similarly, though, 'meter-
tags: | added: canonical-data-platform-eng |
Changed in juju: | |
assignee: | nobody → Heather Lanigan (hmlanigan) |
This also happens when trying to refresh a charm in a LXD cloud, like the data platform charms. In that situation, we see the unit in a failed state for some seconds, before the upgrade proceeds.
Following, there are the logs of another call to juju refresh on LXD, with the easyrsa charm, which also gets into a failed state for some seconds before being upgraded.
unit-easyrsa-0: 18:09:52 ERROR juju.worker.uniter resolver loop error: preparing operation "upgrade to ch:amd64/ jammy/easyrsa- 48" for easyrsa/0: failed to download charm "ch:amd64/ jammy/easyrsa- 48" from API server: Get https:/ /10.28. 44.219: 17070/model/ db3cad23- 9bc3-4762- 8939-9e774e0451 b1/charms? file=%2A& url=ch% 3Aamd64% 2Fjammy% 2Feasyrsa- 48: cannot retrieve charm: ch:amd64/ jammy/easyrsa- 48 jammy/easyrsa- 48" for easyrsa/0: failed to download charm "ch:amd64/ jammy/easyrsa- 48" from API server: Get https:/ /10.28. 44.219: 17070/model/ db3cad23- 9bc3-4762- 8939-9e774e0451 b1/charms? file=%2A& url=ch% 3Aamd64% 2Fjammy% 2Feasyrsa- 48: cannot retrieve charm: ch:amd64/ jammy/easyrsa- 48 dependency "uniter" manifold worker returned unexpected error: preparing operation "upgrade to ch:amd64/ jammy/easyrsa- 48" for easyrsa/0: failed to download charm "ch:amd64/ jammy/easyrsa- 48" from API server: Get https:/ /10.28. 44.219: 17070/model/ db3cad23- 9bc3-4762- 8939-9e774e0451 b1/charms? file=%2A& url=ch% 3Aamd64% 2Fjammy% 2Feasyrsa- 48: cannot retrieve charm: ch:amd64/ jammy/easyrsa- 48
unit-easyrsa-0: 18:09:52 INFO juju.worker.uniter unit "easyrsa/0" shutting down: preparing operation "upgrade to ch:amd64/
unit-easyrsa-0: 18:09:52 ERROR juju.worker.
It's important to not have this failed status, as it may lead to users think that something is broken in the upgrade process.