relation-ids is reporting relation ids of previously removed relations. However, other tools such as network-get fail when fed these relation ids. This causes failures when the charm needs to iterate over all relations, updating information; it attempts to update information on the non-existent relation, fails, and goes into an error state.
$ juju run --unit=postgresql/0 relation-ids db
db:2
db:3
$ juju run --unit=postgresql/0 'network-get db --format yaml -r db:2'
ERROR relation 2 not found (not found)
$ juju run --unit=postgresql/0 'network-get db --format yaml -r db:3'
bind-addresses:
[...]
Seen with Juju 2.7.5 using the cs:postgresql stable charm. The db:2 relation had been destroyed. db:3 is a new relation attempting to be setup.
If something is going away, I would certainly expect there could be a race condition between one 'juju run' and the next. From the sound of this, the issue is more that it happens within a single hook execution, eg:
juju run --unit=postgresql/0 'echo $(relation-ids db); network-get db --format=yaml -r db:2'
If it was something like a deferred event that fires in another hook, that is the sort of thing that I don't think Juju can give guarantees on (I would expect that if something is removed between hook A and hook B, you may get errors if you try to access it in hook B).
It is plausible that this is a case of Dying vs Dead, and 'relation-ids' is including relations that are in Dying but network-get is treating them as gone.
It is also fairly plausible that network-get isn't properly leveraging the hook context and it is then running into problems where at the start of the hook it exists, but at the end of the hook it is gone. We would want to give a stable view of the world during the execution of the hook.