The remote end of a cross-model relation fails intermittently. The workload of that unit continues to function in isolation, but events are no longer propagated to the charm.
As this is an intermittent error I'm not really able to reproduce it in a controlled fashion, but I've attached a dump of my terminal from last time it happened.
```
❯ juju version
2.9.22-ubuntu-amd64
```
Running on microk8s.
```
❯ juju remove-relation spring-music prometheus
❯ juju debug-log -m spring
unit-spring-music-0: 19:53:22 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-created" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:57:41 INFO juju.util.exec run result: exit status 1
unit-spring-music-scrape-config-0: 19:58:37 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:38 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-broken" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:39 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-created" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:58 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:03:56 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-scrape-config-0: 20:04:10 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-scrape-config-0: 20:09:20 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:09:50 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:10:48 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-broken" hook (via hook dispatching script: dispatch)
^C
❯ juju add-relation spring-music prometheus
❯ juju debug-log -m spring
unit-spring-music-0: 19:57:41 INFO juju.util.exec run result: exit status 1
unit-spring-music-scrape-config-0: 19:58:37 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:38 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-broken" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:39 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-created" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:58 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:03:56 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-scrape-config-0: 20:04:10 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-scrape-config-0: 20:09:20 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:09:50 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:10:48 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-broken" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:11:07 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-created" hook (via hook dispatching script: dispatch)
^C
❯ juju debug-log -m controller
controller-0: 20:07:47 INFO juju.apiserver.connection agent disconnected: unit-prometheus-1 for bccb7cc3-a40a-474e-83c3-20b8f0b09a80
controller-0: 20:07:48 INFO juju.apiserver.connection agent login: unit-prometheus-1 for bccb7cc3-a40a-474e-83c3-20b8f0b09a80
controller-0: 20:07:50 INFO juju.apiserver.common setting password for "application-prometheus"
controller-0: 20:07:54 ERROR juju.apiserver.uniter resolving "": lookup : no such host
controller-0: 20:07:55 ERROR juju.apiserver.uniter resolving "": lookup : no such host
controller-0: 20:08:02 INFO juju.apiserver.connection agent login: application-prometheus for bccb7cc3-a40a-474e-83c3-20b8f0b09a80
controller-0: 20:08:02 INFO juju.apiserver.connection agent disconnected: application-prometheus for bccb7cc3-a40a-474e-83c3-20b8f0b09a80
controller-0: 20:08:03 INFO juju.apiserver.connection agent login: unit-prometheus-0 for bccb7cc3-a40a-474e-83c3-20b8f0b09a80
controller-0: 20:10:07 ERROR juju.apiserver.common error stopping *apiserver.pingTimeout resource: ping timeout
controller-0: 20:10:07 INFO juju.apiserver.connection agent disconnected: unit-prometheus-0 for bccb7cc3-a40a-474e-83c3-20b8f0b09a80
^C
❯ juju debug-log -m spring
unit-spring-music-scrape-config-0: 19:58:37 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:38 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-broken" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:39 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-created" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:58 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:03:56 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-scrape-config-0: 20:04:10 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-scrape-config-0: 20:09:20 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:09:50 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:10:48 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-broken" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:11:07 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-created" hook (via hook dispatching script: dispatch)
^C
```
Once this happens, it may be resolved using the following workaround:
```
juju remove-saas someoffer
juju remove-offer -m lma someoffer --force # not always needed but doesn't hurt
juju offer ...
juju consume
juju relate ...
```
This is, in testing, almost shockingly easy to reproduce, at least.
❯ cat metadata.yaml
# Copyright 2021 Canonical
# See LICENSE file for licensing details.
name: breaker
description: |
This Juju charm breaks Juju
summary: |
Freee cross model relations
series:
- focal
provides:
foo:
interface: foo
requires:
bar:
interface: foo
❯ cat charmcraft.yaml
type: charm
bases:
- build-on:
- name: "ubuntu"
channel: "20.04"
run-on:
- name: "ubuntu"
channel: "20.04"
parts:
charm:
build-packages:
- "git"
❯ cat src/charm.py /juju.is/ docs/sdk
#!/usr/bin/env python3
# Copyright 2021 Canonical
# See LICENSE file for licensing details.
#
# Learn more at: https:/
"""Hello, Juju example charm.
This charm is a demonstration of a machine charm written using the Charmed
Operator Framework. It deploys a simple Python Flask web application and
implements a relation to the Grafana charm.
"""
import logging
from ops.charm import CharmBase
from ops.main import main
from ops.model import ActiveStatus
logger = logging. getLogger( __name_ _)
class BreakerCharm( CharmBase) :
def __init__(self, *args):
super( ).__init_ _(*args)
)
)
)
def _on_install(self, event):
self.unit. status = ActiveStatus()
def _foo_relation_ joined( self, event):
logger. info("Foo joined!")
def _bar_relation_ joined( self, event):
logger. info("Bar joined!")
raise Exception("Break it!")
if __name__ == "__main__": # pragma: no cover BreakerCharm)
main(
Then offer a cross-model- relation.
Depending on which side of the relation gets an unhandled exception, a stuck offer, lack of eventing, or both can be reproduced (if both sides throw exceptions).
If the offer is force removed and re-consumed, everything goes back to normal. It certainly feels like Juju is hanging onto an internal reference to some ID for the relation in trying to re-queue the failed event, and until it's gone, nothing else can pass through.