The offering end of a cross-model relation fails intermittently

Bug #1955025 reported by Simon Aronsson
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Unassigned

Bug Description

The remote end of a cross-model relation fails intermittently. The workload of that unit continues to function in isolation, but events are no longer propagated to the charm.

As this is an intermittent error I'm not really able to reproduce it in a controlled fashion, but I've attached a dump of my terminal from last time it happened.

```
❯ juju version
2.9.22-ubuntu-amd64

```

Running on microk8s.

```
❯ juju remove-relation spring-music prometheus

❯ juju debug-log -m spring
unit-spring-music-0: 19:53:22 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-created" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:57:41 INFO juju.util.exec run result: exit status 1
unit-spring-music-scrape-config-0: 19:58:37 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:38 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-broken" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:39 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-created" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:58 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:03:56 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-scrape-config-0: 20:04:10 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-scrape-config-0: 20:09:20 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:09:50 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:10:48 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-broken" hook (via hook dispatching script: dispatch)
^C

❯ juju add-relation spring-music prometheus

❯ juju debug-log -m spring
unit-spring-music-0: 19:57:41 INFO juju.util.exec run result: exit status 1
unit-spring-music-scrape-config-0: 19:58:37 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:38 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-broken" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:39 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-created" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:58 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:03:56 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-scrape-config-0: 20:04:10 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-scrape-config-0: 20:09:20 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:09:50 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:10:48 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-broken" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:11:07 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-created" hook (via hook dispatching script: dispatch)
^C

❯ juju debug-log -m controller
controller-0: 20:07:47 INFO juju.apiserver.connection agent disconnected: unit-prometheus-1 for bccb7cc3-a40a-474e-83c3-20b8f0b09a80
controller-0: 20:07:48 INFO juju.apiserver.connection agent login: unit-prometheus-1 for bccb7cc3-a40a-474e-83c3-20b8f0b09a80
controller-0: 20:07:50 INFO juju.apiserver.common setting password for "application-prometheus"
controller-0: 20:07:54 ERROR juju.apiserver.uniter resolving "": lookup : no such host
controller-0: 20:07:55 ERROR juju.apiserver.uniter resolving "": lookup : no such host
controller-0: 20:08:02 INFO juju.apiserver.connection agent login: application-prometheus for bccb7cc3-a40a-474e-83c3-20b8f0b09a80
controller-0: 20:08:02 INFO juju.apiserver.connection agent disconnected: application-prometheus for bccb7cc3-a40a-474e-83c3-20b8f0b09a80
controller-0: 20:08:03 INFO juju.apiserver.connection agent login: unit-prometheus-0 for bccb7cc3-a40a-474e-83c3-20b8f0b09a80
controller-0: 20:10:07 ERROR juju.apiserver.common error stopping *apiserver.pingTimeout resource: ping timeout
controller-0: 20:10:07 INFO juju.apiserver.connection agent disconnected: unit-prometheus-0 for bccb7cc3-a40a-474e-83c3-20b8f0b09a80
^C

❯ juju debug-log -m spring
unit-spring-music-scrape-config-0: 19:58:37 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:38 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-broken" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:39 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-created" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 19:58:58 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:03:56 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-scrape-config-0: 20:04:10 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-scrape-config-0: 20:09:20 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:09:50 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:10:48 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-broken" hook (via hook dispatching script: dispatch)
unit-spring-music-0: 20:11:07 INFO juju.worker.uniter.operation ran "metrics-endpoint-relation-created" hook (via hook dispatching script: dispatch)
^C
```

Once this happens, it may be resolved using the following workaround:

```
juju remove-saas someoffer
juju remove-offer -m lma someoffer --force # not always needed but doesn't hurt
juju offer ...
juju consume
juju relate ...
```

Simon Aronsson (0x12b)
description: updated
Revision history for this message
Ryan Barry (rbarry) wrote :

This is, in testing, almost shockingly easy to reproduce, at least.

❯ cat metadata.yaml
# Copyright 2021 Canonical
# See LICENSE file for licensing details.
name: breaker
description: |
  This Juju charm breaks Juju
summary: |
  Freee cross model relations
series:
  - focal
provides:
  foo:
    interface: foo
requires:
  bar:
    interface: foo

❯ cat charmcraft.yaml
type: charm
bases:
  - build-on:
    - name: "ubuntu"
      channel: "20.04"
    run-on:
    - name: "ubuntu"
      channel: "20.04"
parts:
  charm:
    build-packages:
    - "git"

❯ cat src/charm.py
#!/usr/bin/env python3
# Copyright 2021 Canonical
# See LICENSE file for licensing details.
#
# Learn more at: https://juju.is/docs/sdk

"""Hello, Juju example charm.

This charm is a demonstration of a machine charm written using the Charmed
Operator Framework. It deploys a simple Python Flask web application and
implements a relation to the Grafana charm.
"""

import logging

from ops.charm import CharmBase
from ops.main import main
from ops.model import ActiveStatus

logger = logging.getLogger(__name__)

class BreakerCharm(CharmBase):

    def __init__(self, *args):
        super().__init__(*args)

        self.framework.observe(
            self.on.foo_relation_joined,
            self._foo_relation_joined,
        )

        self.framework.observe(
            self.on.bar_relation_joined,
            self._bar_relation_joined,
        )

        self.framework.observe(
            self.on.install,
            self._on_install
        )

    def _on_install(self, event):
        self.unit.status = ActiveStatus()

    def _foo_relation_joined(self, event):
        logger.info("Foo joined!")

    def _bar_relation_joined(self, event):
        logger.info("Bar joined!")
        raise Exception("Break it!")

if __name__ == "__main__": # pragma: no cover
    main(BreakerCharm)

Then offer a cross-model-relation.

Depending on which side of the relation gets an unhandled exception, a stuck offer, lack of eventing, or both can be reproduced (if both sides throw exceptions).

If the offer is force removed and re-consumed, everything goes back to normal. It certainly feels like Juju is hanging onto an internal reference to some ID for the relation in trying to re-queue the failed event, and until it's gone, nothing else can pass through.

John A Meinel (jameinel)
Changed in juju:
importance: Undecided → High
milestone: none → 2.9.24
status: New → Triaged
Changed in juju:
milestone: 2.9.24 → 2.9.25
Changed in juju:
milestone: 2.9.25 → 2.9.26
Changed in juju:
milestone: 2.9.26 → 2.9.27
Changed in juju:
milestone: 2.9.27 → 2.9.28
Changed in juju:
milestone: 2.9.28 → 2.9.29
Changed in juju:
milestone: 2.9.29 → 2.9.30
Ian Booth (wallyworld)
Changed in juju:
milestone: 2.9.30 → 2.9-next
Ian Booth (wallyworld)
Changed in juju:
milestone: 2.9-next → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.