endpoint.{relation}.joined never set

Bug #1946068 reported by Giuseppe Petralia
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Incomplete
High
Unassigned

Bug Description

Juju Version 2.8.8

I am trying to setup a CMR between libvirt exporter and prometheus

promethues-libvirt-exporter in the "openstack" model provides the following interface

provides:
  scrape:
    interface: http

prometheus in the "lma" model requires the following:

"requires":
  "target":
    "interface": "http"

In the lma model I run:

juju expose prometheus
juju offer lma.prometheus:target prometheus-target

In the openstack model I run:

juju consume admin/lma.prometheus-target promed
juju add-relation prometheus-libvirt-exporter:scrape promed:target

The flag `endpoint.scrape.joined` is never set, so on the subsequent update-status the following is executed:
https://github.com/juju-solutions/interface-http/blob/master/provides.py#L15

In the prometheus-libvirt-exporter logs I see the following:

2021-10-05 07:28:42 INFO juju.worker.uniter.relation statetracker.go:157 joining relation "promed:target prometheus-libvirt-exporter:scrape"
2021-10-05 07:28:42 INFO juju.worker.uniter.relation statetracker.go:193 joined relation "promed:target prometheus-libvirt-exporter:scrape"
2021-10-05 07:28:42 INFO juju.worker.uniter.operation runhook.go:145 skipped "scrape-relation-created" hook (missing)
2021-10-05 07:32:11 INFO juju-log Reactive main running for hook update-status
2021-10-05 07:32:11 DEBUG juju-log tracer>
tracer: set flag config.default.snapd_refresh
tracer: ++ queue handler hooks/relations/http/provides.py:15:broken:scrape
tracer: ++ queue handler hooks/relations/juju-info/requires.py:19:changed:juju-info
2021-10-05 07:32:11 DEBUG juju-log tracer: set flag config.set.snap_channel
2021-10-05 07:32:11 DEBUG juju-log tracer: set flag config.default.snap_channel
2021-10-05 07:32:11 DEBUG juju-log tracer: set flag config.set.nagios_context
2021-10-05 07:32:11 DEBUG juju-log tracer: set flag config.default.nagios_context
2021-10-05 07:32:11 DEBUG juju-log tracer: set flag config.set.nagios_servicegroups
2021-10-05 07:32:11 DEBUG juju-log tracer: set flag config.default.nagios_servicegroups
2021-10-05 07:32:11 INFO juju-log Initializing Snap Layer
2021-10-05 07:32:11 DEBUG update-status none
2021-10-05 07:32:11 INFO juju-log Initializing Leadership Layer (is follower)
2021-10-05 07:32:11 DEBUG juju-log tracer>
tracer: starting handler dispatch, 20 flags set
tracer: set flag config.default.nagios_context
tracer: set flag config.default.nagios_servicegroups
tracer: set flag config.default.snap_channel
tracer: set flag config.default.snapd_refresh
tracer: set flag config.set.nagios_context
tracer: set flag config.set.nagios_servicegroups
tracer: set flag config.set.snap_channel
tracer: set flag endpoint.dashboards.all_responses
tracer: set flag endpoint.juju-info.changed
tracer: set flag endpoint.juju-info.changed.egress-subnets
tracer: set flag endpoint.juju-info.changed.ingress-address
tracer: set flag endpoint.juju-info.changed.private-address
tracer: set flag endpoint.juju-info.joined
tracer: set flag juju-info.available
tracer: set flag juju-info.connected
tracer: set flag libvirt-exporter.installed
tracer: set flag libvirt-exporter.started
tracer: set flag snap.installed.core
tracer: set flag snap.installed.prometheus-libvirt-exporter
tracer: set flag snap.refresh.set
2021-10-05 07:32:11 DEBUG juju-log tracer: hooks phase, 0 handlers queued
2021-10-05 07:32:11 DEBUG juju-log tracer>
tracer: main dispatch loop, 2 handlers queued
tracer: ++ queue handler hooks/relations/http/provides.py:15:broken:scrape
tracer: ++ queue handler hooks/relations/juju-info/requires.py:19:changed:juju-info
2021-10-05 07:32:11 INFO juju-log Invoking reactive handler: hooks/relations/http/provides.py:15:broken:scrape
2021-10-05 07:32:11 INFO juju-log Invoking reactive handler: hooks/relations/juju-info/requires.py:19:changed:juju-info
2021-10-05 07:32:11 INFO juju.worker.uniter.operation runhook.go:142 ran "update-status" hook (via explicit, bespoke hook script)

After the joined this https://github.com/juju-solutions/interface-http/blob/master/provides.py#L11 is never executed.

Revision history for this message
Cory Johns (johnsca) wrote :

I don't think this is a Juju bug.

The `endpoint.{endpoint_name}.joined` flag is set[1] based on whether there are any units visible on the relation[2], which would be true after the relevant -relation-joined hook was fired. In the log above, I see the line:

2021-10-05 07:28:42 INFO juju.worker.uniter.operation runhook.go:145 skipped "scrape-relation-created" hook (missing)

which indicates that the -relation-created hook was fired (but skipped). However, I don't see any subsequent indication that any -relation-joined hooks were fired. I would generally only expect to see this if the application on the other side of the relation somehow had 0 units / scale, or maybe if it failed during provisioning or one of the early hooks.

I'm also a bit confused why this is using the "http" interface protocol for a Prometheus scrape target rather than the "prometheus" interface protocol[3].

[1]: https://github.com/juju-solutions/charms.reactive/blob/master/charms/reactive/endpoints.py#L229

[2]: https://github.com/juju-solutions/charms.reactive/blob/master/charms/reactive/endpoints.py#L182

[3]: https://git.launchpad.net/interface-prometheus

Revision history for this message
Ian Booth (wallyworld) wrote (last edit ):

I've looked at the supplied database dumps from offering and consuming models.

The bug says this was done:

juju consume admin/lma.prometheus-target promed
juju add-relation prometheus-libvirt-exporter:scrape promed:target

But there's no "promed" saas entry in the consuming model. Was it deleted?

There are 2 saas entries for an offer (uuid = e1ab45f8-3c21-4322-8a7e-c42b9e152317, name="prometheus-target) offering the prometheus target endpoint, each with a different alias: "prometheus" and "prometheus-target". These are related to other apps like rabbitmq-server so I'm guessing are not relevant here.

The offering model does contain an offer called "prometheus-target-require" which does appear to have previously related to "prometheus-libvirt-exporter" in the consuming model, but the saas entry for that offer appears to be deleted.

I think the following is what's needed to remove some left over data in the offering model.

There's an offer connection to the "prometheus-target-require" even though the consuming side has disappeared. So first remove it

juju remove-relation 23061 --force

(If --force not supported in 2.8.8, leave it off).

Hopefully juju offers shows 0 connections now to that offer.

Now open a mongo shell and manually delete 2 orphaned records

db.remoteApplications.deleteOne({"_id":"3e0702b2-0767-4e8e-8cf1-69bd41cf1931:remote-2bd92138b9cc4bb68d8a96a51921d58d"})

db.remoteEntities.deleteOne({"_id":"3e0702b2-0767-4e8e-8cf1-69bd41cf1931:application-remote-2bd92138b9cc4bb68d8a96a51921d58d"})

and to be sure (these may already be gone)

db.relations.deleteOne({"_id":"_id: 3e0702b2-0767-4e8e-8cf1-69bd41cf1931:prometheus:target remote-2bd92138b9cc4bb68d8a96a51921d58d:scrape"})

db.remoteEntities.deleteOne({"_id":"3e0702b2-0767-4e8e-8cf1-69bd41cf1931:relation-prometheus.target#remote-2bd92138b9cc4bb68d8a96a51921d58d.scrape"})

Check the model logs for the model where "prometheus" is deployed. If there are errors associated with unit "prometheus/0", that unit agent may need to be restarted. There's a bug fixed in 2.9 about handling not found relations which may or may not need more db surgery to fix if all else fails.

Revision history for this message
John A Meinel (jameinel) wrote :

We are unlikely to fix this in the 2.8 series, but if it is still an issue in 2.9 we would certainly want to address it.

Changed in juju:
importance: Undecided → High
status: New → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.