relation data in unit not updated until _relation_changed event.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
I have tested this situation with Juju 3.1.8 under MicroK8s (it also happens in charmed kubernetes on top of Openstack).
When one unit updates its unit relation data, the units in the other side of the relation do not get the updated data until the _relation_changed event.
This contrasts with the information in the application databag, that is updated automatically, as I can see that the units in the other application can see immediately.
This situation is problematic, as an unit in error state has to be resolved with "no-retry" until the _relation_changed event occurs.
The full situation is as follows:
Following the tutorial https:/
(the only difference is `juju deploy discourse-k8s --channel edge`).
After a while, all the units will be active:
```
ubuntu@
Model Controller Cloud/Region Version SLA Timestamp
discourse microk8s-localhost microk8s/localhost 3.1.8 unsupported 10:12:39+02:00
App Version Status Scale Charm Channel Rev Address Exposed Message
discourse-k8s 3.2.0 waiting 1 discourse-k8s edge 120 10.152.183.141 no installing agent
postgresql-k8s 14.10 active 1 postgresql-k8s 14/stable 193 10.152.183.49 no
redis-k8s 7.0.4 active 1 redis-k8s latest/edge 27 10.152.183.219 no
Unit Workload Agent Address Ports Message
discourse-k8s/0* active idle 10.1.44.239
postgresql-k8s/0* active idle 10.1.44.197
redis-k8s/0* active idle 10.1.44.221
```
After that I delete both pods for redis-k8s and discourse-k8s:
`kubectl delete pod redis-k8s-0 -n discourse & kubectl delete pod discourse-k8s-0 -n discourse`
Unit address will have changed, and `discourse-k8s/0` will go into error state in `upgrade-charm` because
the unit ip address in the relation is not and will not be updated.
```
ubuntu@
Model Controller Cloud/Region Version SLA Timestamp
discourse microk8s-localhost microk8s/localhost 3.1.8 unsupported 10:14:34+02:00
App Version Status Scale Charm Channel Rev Address Exposed Message
discourse-k8s 3.2.0 waiting 1 discourse-k8s edge 120 10.152.183.141 no installing agent
postgresql-k8s 14.10 active 1 postgresql-k8s 14/stable 193 10.152.183.49 no Primary
redis-k8s 7.0.4 active 1 redis-k8s latest/edge 27 10.152.183.219 no
Unit Workload Agent Address Ports Message
discourse-k8s/0* error idle 10.1.44.206 hook failed: "upgrade-charm"
postgresql-k8s/0* active idle 10.1.44.197 Primary
redis-k8s/0* active idle 10.1.44.200
```
```
ubuntu@
10.1.44.221
```
The correct value (the hostname field) is however in the relation unit data of redis-k8s/0.
```
ubuntu@
discourse-k8s/0:
workload-version: |
3.2.0
opened-ports: []
charm: ch:amd64/
leader: true
life: alive
relation-info:
...
- relation-id: 5
endpoint: redis
related-
application
related-units:
redis-k8s/0:
in-scope: true
data:
hostname: 10.1.44.200
port: "6379"
...
provider-id: discourse-k8s-0
address: 10.1.44.206
```
Resolving the unit state with `juju resolve discourse-k8s/0 --no-retry` will not update the
unit relation data until the event `redis_
through the events like `upgrade-charm`, `config-changed`, `start`, `discourse-
This contrasts with setting a field in the application databag like:
```
juju exec --unit redis-k8s/0 "relation-set -r5 --app appfield=field2"
```
This will be seen immediately by the the other unit, without even having to resolve the error:
```
ubuntu@
appfield: field2
```
Is this behavior intended of the asymmetry between unit and application data?
Shouldn't the unit relation data be updated before the redis_relation_
description: | updated |
description: | updated |
description: | updated |
tags: | added: canonical-is |
description: | updated |
description: | updated |
I've gone over the logic for this. It is quite convoluted, but it does represent explicitly the scenario you've described.
We cache relation settings on the agent side, and invalidate/prune the cache selectively based on hook type and arguments.
It happens that application settings are always pruned indiscriminately when a new context is created, which causes the first fetch to go to the controller.
Because we only invalidate *unit* members at the beginning of a relation_* hook, the cache has the last fetched data in all other hooks types and for exec.
I will discuss potential avenues with the team.