JUJU_REMOTE_APP environment variable is not set in relation_broken hook

Bug #1960934 reported by Michele Mancioppi
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Incomplete
High
Yang Kelvin Liu

Bug Description

Juju is apparently not setting the `JUJU_REMOTE_APP` environment variable in relation_broken hooks and this causes hooks to fail with errors like the following:

```
unit-prometheus-k8s-0: 17:27:39 ERROR unit.prometheus-k8s/0.juju-log ingress:3: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/model.py", line 1521, in _run
    result = run(args, **kwargs)
  File "/usr/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('/var/lib/juju/tools/unit-prometheus-k8s-0/relation-get', '-r', '3', '-', '', '--app', '--format=json')' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./src/charm.py", line 377, in <module>
    main(PrometheusCharm)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/main.py", line 414, in main
    charm = charm_class(framework)
  File "./src/charm.py", line 68, in __init__
    external_url = urlparse(self._external_url)
  File "./src/charm.py", line 364, in _external_url
    if ingress_url := self.ingress.url:
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/lib/charms/traefik_k8s/v0/ingress_per_unit/ingress_per_unit.py", line 246, in url
    if not self.urls:
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/lib/charms/traefik_k8s/v0/ingress_per_unit/ingress_per_unit.py", line 234, in urls
    if not self.is_ready():
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/sborl/relation.py", line 280, in is_ready
    return any(self.is_ready(relation) for relation in self.relations)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/sborl/relation.py", line 280, in <genexpr>
    return any(self.is_ready(relation) for relation in self.relations)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/sborl/relation.py", line 282, in is_ready
    data = self.unwrap(relation)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/sborl/relation.py", line 357, in unwrap
    version = self._get_version(relation)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/sborl/relation.py", line 207, in _get_version
    remote_versions_raw = relation.data[relation.app].get(VERSION_KEY)
  File "/usr/lib/python3.8/_collections_abc.py", line 660, in get
    return self[key]
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/model.py", line 430, in __getitem__
    return self._data[key]
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/model.py", line 414, in _data
    data = self._lazy_data = self._load()
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/model.py", line 779, in _load
    return self._backend.relation_get(self.relation.id, self._entity.name, self._is_app)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/model.py", line 1588, in relation_get
    return self._run(*args, return_output=True, use_json=True)
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/ops/model.py", line 1523, in _run
    raise ModelError(e.stderr)
ops.model.ModelError: b'ERROR "" is not a valid unit or application\n'
unit-prometheus-k8s-0: 17:27:39 ERROR juju.worker.uniter.operation hook "ingress-relation-broken" (via hook dispatching script: dispatch) failed: exit status 1
```

This is what I get when I log a `os.environ`:

```
unit-prometheus-k8s-0: 17:27:39 DEBUG unit.prometheus-k8s/0.juju-log ingress:3: environ({'JUJU_UNIT_NAME': 'prometheus-k8s/0', 'KUBERNETES_PORT': 'tcp://10.152.183.1:443', 'KUBERNETES_SERVICE_PORT': '443', 'JUJU_VERSION': '2.9.22', 'JUJU_CHARM_HTTP_PROXY': '', 'APT_LISTCHANGES_FRONTEND': 'none', 'JUJU_CONTEXT_ID': 'prometheus-k8s/0-ingress-relation-broken-1960487765391186668', 'JUJU_AGENT_SOCKET_NETWORK': 'unix', 'JUJU_API_ADDRESSES': '10.152.183.250:17070 controller-service.controller-development.svc.cluster.local:17070', 'JUJU_CHARM_HTTPS_PROXY': '', 'JUJU_AGENT_SOCKET_ADDRESS': '@/var/lib/juju/agents/unit-prometheus-k8s-0/agent.socket', 'JUJU_MODEL_NAME': 'cos', 'JUJU_DISPATCH_PATH': 'hooks/ingress-relation-broken', 'JUJU_AVAILABILITY_ZONE': '', 'JUJU_REMOTE_UNIT': '', 'JUJU_CHARM_DIR': '/var/lib/juju/agents/unit-prometheus-k8s-0/charm', 'TERM': 'tmux-256color', 'KUBERNETES_PORT_443_TCP_ADDR': '10.152.183.1', 'JUJU_RELATION': 'ingress', 'PATH': '/var/lib/juju/tools/unit-prometheus-k8s-0:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/charm/bin', 'JUJU_RELATION_ID': 'ingress:3', 'KUBERNETES_PORT_443_TCP_PORT': '443', 'JUJU_METER_STATUS': 'AMBER', 'KUBERNETES_PORT_443_TCP_PROTO': 'tcp', 'JUJU_HOOK_NAME': 'ingress-relation-broken', 'LANG': 'C.UTF-8', 'CLOUD_API_VERSION': '1.23.0', 'DEBIAN_FRONTEND': 'noninteractive', 'JUJU_SLA': 'unsupported', 'KUBERNETES_PORT_443_TCP': 'tcp://10.152.183.1:443', 'KUBERNETES_SERVICE_PORT_HTTPS': '443', 'JUJU_MODEL_UUID': '36c17c1c-090f-4ee9-8c45-d1fb5e18d3f6', 'KUBERNETES_SERVICE_HOST': '10.152.183.1', 'JUJU_MACHINE_ID': '', 'JUJU_CHARM_FTP_PROXY': '', 'JUJU_METER_INFO': 'not set', 'PWD': '/var/lib/juju/agents/unit-prometheus-k8s-0/charm', 'JUJU_PRINCIPAL_UNIT': '', 'JUJU_CHARM_NO_PROXY': '127.0.0.1,localhost,::1', 'PYTHONPATH': 'lib:venv', 'CHARM_DIR': '/var/lib/juju/agents/unit-prometheus-k8s-0/charm', 'JUJU_REMOTE_APP': '', 'OPERATOR_DISPATCH': '1'})
```

It seems to me that the problem is that `JUJU_REMOTE_APP` is not set, while it should.

Revision history for this message
John A Meinel (jameinel) wrote :

By the time you get to 'relation-broken' the remote application has been removed from the relation (we are informing you that it has gone away).

If you look at 'relation-list' for that relation, the unit and the associated app has been removed.

source-relation-changed
.../charm# relation-ids source
source:1
.../charm# relation-list -r source:1
dummy-source/0

vs

source-relation-broken
.../charm# relation-ids source
source:1
.../charm# relation-list -r source:1

(no text)

it is true that 'relation-ids' still lists the relation (as relation-broken is the last step before we remove it).
However, it is a little odd for the charm to decide to check "is the remote app telling me that this relation is happy" when we're telling you that the relation is going away.

I don't know exactly why ingress_per_unit is blindly iterating all relations, and not handling the case where you have a relation but the remote application is going away. Certainly it has relation-broken has always operated this way.
I'm guessing that at this point
  File "/var/lib/juju/agents/unit-prometheus-k8s-0/charm/venv/sborl/relation.py", line 207, in _get_version
    remote_versions_raw = relation.data[relation.app].get(VERSION_KEY)

relation.app is None (or maybe ""?)

Changed in juju:
status: New → Incomplete
Revision history for this message
Michele Mancioppi (michele-mancioppi) wrote :

I am somewhat ambivalent on whether it is semantically consistent to have the remote_app there or not when executing the hook. It does, however, seem to me a violation of the element of least surprise: I can discover the relation iterating over the model, but if I access some of its fields, things blow up.

Revision history for this message
Cory Johns (johnsca) wrote :

relation.app is an Application instance during the broken hook but it has a blank name. I believe that the application-level relation data is also still available, but the way the framework tries to load the data, it ends up passing that empty name in to relation-get, leading to the error. I don't think that the charm / interface code actually needs that data, since it's really just checking whether the relation is valid or not. Perhaps this should be handled more gracefully in the framework; we can certainly work around it in the library but I've seen this edge case come up before in other charm code so it's probably worth handling it in the framework and just returning empty relation data to avoid the issue.

Revision history for this message
Michele Mancioppi (michele-mancioppi) wrote :

I’ll run it by Pen, let’s see what her take is

Revision history for this message
Pen Gale (pengale) wrote :

I agree that the Operator Framework should not fall over in this case.

I went ahead and filed https://github.com/canonical/operator/issues/693

Revision history for this message
Claudiu Belu (cbelu) wrote :

I want to mention that I've encountered this type of errors before [1], and that they still happen for the nginx-ingress-integrator charm when a related application is removed (the charm iterates over its relations in order to establish what Kubernetes Services and Ingress Resources need to be created / removed).

Will mention this in the https://github.com/canonical/operator/issues/693 as well.

[1] https://github.com/finos/legend-juju-gitlab-integrator/pull/6

Revision history for this message
Robert Carlsen (rwcarlsen) wrote :

I'm working on some better handling of this from the operator framework side. But it would still be helpful I think to have JUJU_REMOTE_APP set in this case to possibly facilitate cleanup for a charm - e.g. purging local caches, etc. it may have involving the remote app.

Revision history for this message
John A Meinel (jameinel) wrote :

Looking at the Juju side of the code, it is trying to set it:

github.com/juju/juju/worker/uniter/relation/resolver.go 235:
                return hook.Info{
                        Kind: hooks.RelationBroken,
                        RelationId: relationId,
                        RemoteApplication: r.stateTracker.RemoteApplication(relationId),
                }, nil

It would seem by that time the remote application is already gone from the tracking state, so we don't have a value to give.
Earlier in that file we do have a list of locally known app names in the relation, it might be possible for us to look there. I'm not positive, since we have been doing things like 'relation-departed' and having things cleaning up from there. I'm not positive if localState.ApplicationMembers only holds a value if the application has set a value in the app data bag. But it might be a potential solution.

Revision history for this message
John A Meinel (jameinel) wrote :

The code is clearly trying to set JUJU_REMOTE_APP, but is failing to find it when running relation broken. We can determine if we *could* update the context that gets set up for the Uniter, or whether that data is already gone, and then whether we could pull that information out of the last-known-state for the unit agent.

Changed in juju:
importance: Undecided → High
milestone: none → 2.9-next
status: Incomplete → Triaged
Ian Booth (wallyworld)
Changed in juju:
milestone: 2.9-next → none
Revision history for this message
John A Meinel (jameinel) wrote :

I did some debugging here, and at least for Machine charms, every time I hit relation-broken, it had JUJU_REMOTE_APP set. IIRC the unit agent that runs on containers does run a different code path, so it still might be a bug there.

Revision history for this message
Tony Meyer (tony-meyer) wrote :

The behaviour is different in k8s and machine on Juju 3.16 at least.

* Added app A and app B, integrated them.
* Scaled app A to 2 units.
* Removed unit from app A

With (micro)k8s, JUJU_REMOTE_APP was the empty string in the relation-broken hook triggered on A/{departing-unit} . With machine (localhost lxd), JUJU_REMOTE_APP was the name of B.

Ian Booth (wallyworld)
Changed in juju:
milestone: none → 3.3.2
Changed in juju:
assignee: nobody → Yang Kelvin Liu (kelvin.liu)
Changed in juju:
status: Triaged → In Progress
Revision history for this message
Yang Kelvin Liu (kelvin.liu) wrote :

Hey Tony, which charm did you use?
Both 3.1.8 and 3.3.2 work for me.
I deployed discourse:

```

# Add the model
juju add-model t1

# Deploy the charms
juju deploy redis-k8s
juju deploy postgresql-k8s --channel 14/stable --trust
juju deploy discourse-k8s

# Enable required PostgreSQL extensions
juju config postgresql-k8s plugin_hstore_enable=True
juju config postgresql-k8s plugin_pg_trgm_enable=True

# Relate redis-k8s and postgresql-k8s to discourse-k8s
juju relate redis-k8s discourse-k8s
juju relate discourse-k8s postgresql-k8s

```

unit-discourse-k8s-1: 11:36:48 INFO unit.discourse-k8s/1.juju-log database:4: Database relation broken os.environ => environ({'JUJU_UNIT_NAME': 'discourse-k8s/1', 'KUBERNETES_PORT': 'tcp://10.152.183.1:443', 'KUBERNETES_SERVICE_PORT': '443', 'JUJU_VERSION': '3.1.8', 'JUJU_CHARM_HTTP_PROXY': '', 'APT_LISTCHANGES_FRONTEND': 'none', 'JUJU_CONTEXT_ID': 'discourse-k8s/1-database-relation-broken-2597332874284841740', 'JUJU_AGENT_SOCKET_NETWORK': 'unix', 'JUJU_API_ADDRESSES': '10.152.183.167:17070 controller-service.controller-k1.svc.cluster.local:17070', 'JUJU_CHARM_HTTPS_PROXY': '', 'JUJU_AGENT_SOCKET_ADDRESS': '@/var/lib/juju/agents/unit-discourse-k8s-1/agent.socket', 'JUJU_MODEL_NAME': 't2', 'JUJU_DISPATCH_PATH': 'hooks/database-relation-broken', 'JUJU_AVAILABILITY_ZONE': '', 'JUJU_REMOTE_UNIT': '', 'JUJU_CHARM_DIR': '/var/lib/juju/agents/unit-discourse-k8s-1/charm', 'TERM': 'tmux-256color', 'KUBERNETES_PORT_443_TCP_ADDR': '10.152.183.1', 'JUJU_RELATION': 'database', 'PATH': '/var/lib/juju/tools/unit-discourse-k8s-1:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/charm/bin', 'JUJU_RELATION_ID': 'database:4', 'KUBERNETES_PORT_443_TCP_PORT': '443', 'JUJU_METER_STATUS': 'AMBER', 'KUBERNETES_PORT_443_TCP_PROTO': 'tcp', 'JUJU_HOOK_NAME': 'database-relation-broken', 'LANG': 'C.UTF-8', 'CLOUD_API_VERSION': '1.28.0', 'DEBIAN_FRONTEND': 'noninteractive', 'JUJU_SLA': 'unsupported', 'KUBERNETES_PORT_443_TCP': 'tcp://10.152.183.1:443', 'KUBERNETES_SERVICE_PORT_HTTPS': '443', 'JUJU_MODEL_UUID': '33d01c6b-3c19-444d-82c6-a2950a0ca9c2', 'KUBERNETES_SERVICE_HOST': '10.152.183.1', 'JUJU_MACHINE_ID': '', 'JUJU_CHARM_FTP_PROXY': '', 'JUJU_METER_INFO': 'not set', 'PWD': '/var/lib/juju/agents/unit-discourse-k8s-1/charm', 'JUJU_PRINCIPAL_UNIT': '', 'JUJU_CHARM_NO_PROXY': '127.0.0.1,localhost,::1', 'PYTHONPATH': 'lib:venv', 'CHARM_DIR': '/var/lib/juju/agents/unit-discourse-k8s-1/charm', 'JUJU_REMOTE_APP': 'postgresql-k8s', 'OPERATOR_DISPATCH': '1'})

Revision history for this message
Tony Meyer (tony-meyer) wrote :

Argh, I could reliably reproduce this yesterday and now I can't, and I don't understand why. I was just using two dummy charms that don't do anything other that log on the relations. But doing the exact same steps with the same code I get `JUJU_REMOTE_APP` non-empty today.

The only thing that happened is that my Juju broke with:

```
tameyer@tam-canoncial-1:~/scratch/juju-remote-app-blank$ juju status --relations
ERROR cannot connect to k8s api server; try running 'juju update-k8s --client <k8s cloud name>': starting proxy for api connection: connecting k8s proxy: Get "https://127.0.0.1:16443/api/v1/namespaces/controller-microk8s-localhost/services/controller-service": dial tcp 127.0.0.1:16443: connect: connection refused
```

And I did as it suggested:

```
tameyer@tam-canoncial-1:~/scratch/juju-remote-app-blank$ juju update-k8s --client microk8s
k8s cloud "microk8s" updated on this client.
```

I don't know why I suddenly had to do that, or what exactly `update-k8s` does. Is it possible that somehow fixed this? And we do actually provide `JUJU_REMOTE_APP` all the time except in rare broken situations and so the bug is invalid/outdated?

Revision history for this message
Yang Kelvin Liu (kelvin.liu) wrote :

Microk8s is a built-in cloud we support, I don't think you need to run juju add-k8s or update-k8s to use microk8s.
You can just run `juju bootstrap microk8s` directly.
Yeah, so both Tony and I cannot reproduce this now. I will mark this bug as incomplete.
Feel free to open this bug again with reproducing steps once you experience it again.

Changed in juju:
status: In Progress → Incomplete
tags: added: canonical-data-platform-eng
Changed in juju:
milestone: 3.3.2 → 3.3.4
Ian Booth (wallyworld)
Changed in juju:
milestone: 3.3.4 → 3.3.5
Harry Pidcock (hpidcock)
Changed in juju:
milestone: 3.3.5 → 3.3.6
Harry Pidcock (hpidcock)
Changed in juju:
milestone: 3.3.6 → 3.4.4
Revision history for this message
Carl Csaposs (carlcsaposs) wrote :

I was able to reproduce this on k8s with the following steps on juju 3.4.2:
juju add-model foo1
juju deploy mysql-router-k8s
juju deploy mysql-test-app
juju integrate mysql-router-k8s mysql-test-app
Wait for idle
juju debug-hooks mysql-router-k8s/0
juju remove-relation mysql-router-k8s mysql-test-app
In debug-hooks session, dispatch event (relation departed) & exit. At the exact same time (in another terminal), run kubectl -n foo1 delete pod mysql-router-k8s-0 --force
Exit debug-hooks session and run juju debug-hooks mysql-router-k8s/0 again (to debug new pod)
In debug-hooks session, dispatch & exit until you get to relation-broken event
Then `echo $JUJU_REMOTE_APP` is empty
However, `../../../tools/unit-mysql-router-k8s-0/relation-get -r database:4 - --app mysql-test-app` still shows the remote app's relation data

I believe this bug happens when the remote app (mysql-test-app) finishes executing relation-broken before the local app (mysql-router-k8s) starts to execute relation-broken. I imagine there's a better way to reproduce this (that doesn't involve deleting the pod), but couldn't think of one off the top of my head

Changed in juju:
milestone: 3.4.4 → 3.4.5
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.