relation-ids reports relation, not found by network-get

Bug #1870013 reported by Stuart Bishop
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Achilleas Anagnostopoulos

Bug Description

relation-ids is reporting relation ids of previously removed relations. However, other tools such as network-get fail when fed these relation ids. This causes failures when the charm needs to iterate over all relations, updating information; it attempts to update information on the non-existent relation, fails, and goes into an error state.

$ juju run --unit=postgresql/0 relation-ids db
db:2
db:3

$ juju run --unit=postgresql/0 'network-get db --format yaml -r db:2'
ERROR relation 2 not found (not found)

$ juju run --unit=postgresql/0 'network-get db --format yaml -r db:3'
bind-addresses:
[...]

Seen with Juju 2.7.5 using the cs:postgresql stable charm. The db:2 relation had been destroyed. db:3 is a new relation attempting to be setup.

Tags: canonical-is
Stuart Bishop (stub)
tags: added: canonical-is
Revision history for this message
John A Meinel (jameinel) wrote :

If something is going away, I would certainly expect there could be a race condition between one 'juju run' and the next. From the sound of this, the issue is more that it happens within a single hook execution, eg:

juju run --unit=postgresql/0 'echo $(relation-ids db); network-get db --format=yaml -r db:2'

If it was something like a deferred event that fires in another hook, that is the sort of thing that I don't think Juju can give guarantees on (I would expect that if something is removed between hook A and hook B, you may get errors if you try to access it in hook B).

It is plausible that this is a case of Dying vs Dead, and 'relation-ids' is including relations that are in Dying but network-get is treating them as gone.

It is also fairly plausible that network-get isn't properly leveraging the hook context and it is then running into problems where at the start of the hook it exists, but at the end of the hook it is gone. We would want to give a stable view of the world during the execution of the hook.

Changed in juju:
milestone: none → 2.8-rc1
Revision history for this message
Stuart Bishop (stub) wrote :

The problem was first seen with a deployment of cs:postgresql, where the network-get failed. The diagnosis several minutes later with 'juju run' showed relation-ids reporting the dead relation, which is what caused the charm to run network-get on the dead relation (because it was told it still existed).

Revision history for this message
Stuart Bishop (stub) wrote :

(the cs:postgresql unit remained in an error state if that matters)

Ian Booth (wallyworld)
Changed in juju:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Ian Booth (wallyworld) wrote :

Looking at the code, it does seem that when we populate the hook context, we do that with all relations, even if they are Dead. Which seems wrong to me.

Revision history for this message
Ian Booth (wallyworld) wrote :

And the network-get command also is fine with dead relations, but it does error if they are not found.

But

The issue is that NetworkInfo does not used data cached on the hook context - it goes to the backend. So we need to look at preloading the hook context with the network info.

Tim Penhey (thumper)
Changed in juju:
milestone: 2.8-rc1 → 2.8.1
Changed in juju:
assignee: nobody → Achilleas Anagnostopoulos (achilleasa)
Revision history for this message
Achilleas Anagnostopoulos (achilleasa) wrote :

PR https://github.com/juju/juju/pull/11790 includes a stop gap fix for 2.8 that filters 'relation-ids' output to remove dead relations.

Note that this fix does not address the underlying issue (correctly populating the hook context for network-get) but rather ensures that workflows that iterate the output of relation-ids and then query juju for additional relation information (the scenario described in the bug report) will not observe dead relations.

We should probably open another bug specifically for fixing the hook context issue in network-get.

Changed in juju:
status: Triaged → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.