CMR fails to remove from offer model

Bug #1840850 reported by Chris Sanders
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Christian Muirhead

Bug Description

I was trying to use CMR to let a monitoring model scrape data from ceph-mon. It turns out ceph-mon doesn’t support CMR on the client relation, but now I can’t seem to remove the offer.

Steps I’ve taken so far.

From the Ceph controller

juju offer ceph-mon:client
Application "ceph-mon" endpoints [client] available at "admin/prod.ceph-mon"

From the consuming controller
juju add-relation ceph-exporter prod-maas:admin/prod.ceph-mon
This part worked as expected, and that’s when I found the limitation on the relation handling of ceph-mon.

To reverse this I then:

From the consuming controller
juju remove-relation ceph-exporter ceph-mon
This worked and removed the relation, the charm recognized this and set relation flags (states) as expected. However, after this I tried to re-add it to debug further and that’s when I noticed issues.

The relation never fired a new hooks for the ceph-exporter. The relation was also not added to the list of relations. Checking back on the offer model I see that it still think the connection exists.

From the offer controller
Offer Application Charm Rev Connected Endpoint Interface Role
ceph-mon ceph-mon ceph-mon 42 1/1 client ceph-client provider
I’ve tried removing the relation (from the consumer), then removing the SAAS from the Offer and it says there is still a relation. I then tried removing with --force which doesn’t error but doesn’t remove the relation either.

As far as I can tell the ceph-mon has never had the ‘departed’ hook fired and the offer model doesn’t seem to recognize the removal of the relation at all.

Both controllers are juju 2.6.6 as are the models.

I can haven't re-run this further with debug commands as I was concerned I might make things worse. I'm happy to run the remove-relation or remove-sas commands with --debug if that's helpful and you don't think it will make the situation worse.

Tags: cross-model
Revision history for this message
Ian Booth (wallyworld) wrote :

Did you try removing the relation from the offering side using the relation number you see when you list offers? If there's issues with one side of the relation communicating across models, you typically do need to remove the relation on both sides.

Changed in juju:
milestone: none → 2.7-beta1
status: New → Triaged
importance: Undecided → High
tags: added: cross-model
Revision history for this message
Chris Sanders (chris.sanders) wrote :

Doesn't seem to help.

$ juju list-offers
Offer User Relation id Status Endpoint Interface Role Ingress subnets
ceph-mon admin 6 joined client ceph-client provider 192.168.1.172/32
telegraf admin 7 joined prometheus-client http provider 192.168.1.27/32

$ juju remove-relation 6 --debug
12:29:42 INFO juju.cmd supercommand.go:57 running juju [2.6.6 gc go1.10.4]
12:29:42 DEBUG juju.cmd supercommand.go:58 args: []string{"/snap/juju/8594/bin/juju", "remove-relation", "6", "--debug"}
12:29:42 INFO juju.juju api.go:67 connecting to API addresses: [192.168.0.162:17070]
12:29:42 DEBUG juju.api apiclient.go:1092 successfully dialed "wss://192.168.0.162:17070/model/6218f353-9afa-4eef-822d-549fbb89bcd1/api"
12:29:42 INFO juju.api apiclient.go:624 connection established to "wss://192.168.0.162:17070/model/6218f353-9afa-4eef-822d-549fbb89bcd1/api"
12:29:42 DEBUG juju.api monitor.go:35 RPC connection died
12:29:42 INFO cmd supercommand.go:502 command finished

Revision history for this message
Ian Booth (wallyworld) wrote :

To clarify, you remove relation 6 and list-offers still shows the relation as present? And offer removal fails (even with --force)? Does show-status-log for the ceph-mon unit show the departed/broken hooks running after the relation is removed? Is it possible to turn on "developer-mode" feature flag and get a (sanitised) juju dump-db output?

Revision history for this message
Chris Sanders (chris.sanders) wrote :

To clarify, you remove relation 6 and list-offers still shows the relation as present?
Yes, still present even after remove-relation by name and ID.

And offer removal fails (even with --force)?
It doesn't return anything but the relation remains.

Does show-status-log for the ceph-mon unit show the departed/broken hooks running after the relation is removed?
First time I've tried this but show-status-log on the leader unit does not show any new logging when running the remove-relation (with relation id if that matters). Additionally, the user created for the relation in cephx still exists which is why I was previously guessing the departed hook is not firing.

Is it possible to turn on "developer-mode" feature flag and get a (sanitised) juju dump-db output?
That's just setting the model config 'development' to true from (https://jaas.ai/docs/configuring-models), rerunning the above, and using what to dump the db?

I'm happy to get you what I can. I'll want to share that dump with you privately but other than that sure.

Revision history for this message
Ian Booth (wallyworld) wrote :

To debug this, we really need some controller logs and preferably also a juju dump-db from both offer and consuming models (needs developer-mode feature flag enabled on the client).

Controller logs from both models (from both controllers if the cross model relation is across controllers) would be obtained after turning on extra logging and re-running the remove

juju model-config logging-config="<root>=INFO;juju.apiserver.common.crossmodel=DEBUG;juju.apiserver.crossmodelrelations=DEBUG;juju.worker.remoterelations=DEBUG;juju.worker.uniter=DEBUG;juju.state=DEBUG"

Revision history for this message
Chris Sanders (chris.sanders) wrote :
Download full text (3.9 KiB)

After setting the above and rerunning the remove.

$ juju remove-relation --force 6 --debug
08:51:20 INFO juju.cmd supercommand.go:57 running juju [2.6.8 gc go1.10.4]
08:51:20 DEBUG juju.cmd supercommand.go:58 args: []string{"/snap/juju/8873/bin/juju", "remove-relation", "--force", "6", "--debug"}
08:51:20 INFO juju.juju api.go:67 connecting to API addresses: [192.168.0.162:17070]
08:51:20 DEBUG juju.api apiclient.go:1092 successfully dialed "wss://192.168.0.162:17070/model/6218f353-9afa-4eef-822d-549fbb89bcd1/api"
08:51:20 INFO juju.api apiclient.go:624 connection established to "wss://192.168.0.162:17070/model/6218f353-9afa-4eef-822d-549fbb89bcd1/api"
08:51:20 DEBUG juju.api monitor.go:35 RPC connection died
08:51:20 INFO cmd supercommand.go:502 command finished

juju debug-log still shows nothing at all during the operation.
There is a lot of this, from setting the config and then no updates at all.
machine-4: 08:49:20 DEBUG juju.worker.logger reconfiguring logging from "<root>=DEBUG;unit=DEBUG" to "<root>=INFO;juju.apiserver.common.crossmodel=DEBUG;juju.apiserver.crossmodelrelations=DEBUG;juju.worker.remoterelations=DEBUG;juju.worker.uniter=DEBUG;juju.state=DEBUG;unit=DEBUG"

Grabbing audit.log from the controller, you'll see I ran the remove against the wrong id followed by the correct id (6) which says it doesn't error.

{"conversation":{"who":"admin","what":"/snap/juju/8873/bin/juju model-config logging-config=\u003croot\u003e=INFO;juju.apiserver.common.crossmodel=DEBUG;juju.apiserver.crossmodelrelations=DEBUG;juju.worker.remoterelations=DEBUG;juju.worker.uniter=DEBUG;juju.state=DEBUG","when":"2019-09-03T13:49:19Z","model-name":"prod","model-uuid":"6218f353-9afa-4eef-822d-549fbb89bcd1","conversation-id":"71a103d327a9a494","connection-id":"52"}}
{"request":{"conversation-id":"71a103d327a9a494","connection-id":"52","request-id":2,"when":"2019-09-03T13:49:19Z","facade":"ModelConfig","method":"ModelGet","version":2}}
{"errors":{"conversation-id":"71a103d327a9a494","connection-id":"52","request-id":2,"when":"2019-09-03T13:49:19Z","errors":null}}
{"request":{"conversation-id":"71a103d327a9a494","connection-id":"52","request-id":3,"when":"2019-09-03T13:49:19Z","facade":"ModelConfig","method":"ModelSet","version":2}}
{"errors":{"conversation-id":"71a103d327a9a494","connection-id":"52","request-id":3,"when":"2019-09-03T13:49:19Z","errors":null}}
{"conversation":{"who":"admin","what":"/snap/juju/8873/bin/juju remove-relation --force 42 --debug","when":"2019-09-03T13:50:54Z","model-name":"prod","model-uuid":"6218f353-9afa-4eef-822d-549fbb89bcd1","conversation-id":"819b801a96fd45b2","connection-id":"55"}}
{"request":{"conversation-id":"819b801a96fd45b2","connection-id":"55","request-id":2,"when":"2019-09-03T13:50:54Z","facade":"Application","method":"DestroyRelation","version":9}}
{"errors":{"conversation-id":"819b801a96fd45b2","connection-id":"55","request-id":2,"when":"2019-09-03T13:50:54Z","errors":[{"message":"relation 42 not found","code":"not found"}]}}
{"conversation":{"who":"admin","what":"/snap/juju/8873/bin/juju remove-relation --force 6 --debug","when":"2019-09-03T13:51:20Z","model-name":"prod","model-uuid":"6218f353-9afa-4eef-82...

Read more...

Revision history for this message
Chris Sanders (chris.sanders) wrote :

I've shared the backups from both controllers with Ian directly, if anyone else on the juju team is looking for them he should have them.

Revision history for this message
Ian Booth (wallyworld) wrote :

Looking at the database bson dump files, we think we've fixed this bug in the 2.6.9 release due out next week. You can upgrade the controller to the 2.6.9 candidate snap if you want to try it out sooner.

What is happening is that the relation 6 is being removed, and it gets marked as Dying. It waits for an orderly shutdown, but the offering side of the relation has already been removed because the model went away. So the consuming side is waiting forever. In 2.6.9, "remove-relation --force 6" will handle this case and remove the relation, allowing the app and/or the model to be also deleted.

The root cause was in the remove relation code, which cascaded up to prevent the app and also any offers on the app from being removed etc.

I'll mark this as fix committed for 2.6.9 based on the fact we've think we've identified the root cause and done a fix.

Changed in juju:
milestone: 2.7-beta1 → 2.6.9
assignee: nobody → Christian Muirhead (2-xtian)
status: Triaged → Fix Committed
Revision history for this message
Chris Sanders (chris.sanders) wrote :

Great thanks, I'm not in a particular hurry I'll wait for 2.6.9's release and give this a shot.

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.