Stale CMR offers causing model to not destroy properly
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
Ian Booth |
Bug Description
Juju 2.7.5
Testing CMR. 1 model in openstack with own controller making offers. 1 model on Bare-Metal via MAAS based controller.
Offers are created in the openstack model and join successfully:
---
$ juju offers
Offer User Relation id Status Endpoint Interface Role Ingress subnets
easyrsa admin 24 joined client tls-certificates provider 172.16.7.0/24
etcd admin 23 joined db etcd provider 172.16.7.0/24
kubeapi-
kubernetes-master admin 25 joined kube-control kube-control provider 172.16.7.0/24
---
Relations join fine in the consuming model:
---
$ juju status --relations
Model Controller Cloud/Region Version SLA Timestamp
k8s-worker jhillman-maas jhillman-maas 2.7.5 unsupported 14:27:22-04:00
SAAS Status Store URL
easyrsa active openstack-regionone admin/default.
etcd active openstack-regionone admin/default.etcd
kubeapi-
kubernetes-master active openstack-regionone admin/default.
App Version Status Scale Charm Store Rev OS Notes
containerd active 1 containerd jujucharms 61 ubuntu
flannel 0.11.0 active 1 flannel jujucharms 468 ubuntu
kubernetes-
Unit Workload Agent Machine Public address Ports Message
kubernetes-
containerd/0* active idle 172.16.7.254 Container runtime available
flannel/0* active idle 172.16.7.254 Flannel subnet 10.1.76.1/24
Machine State DNS Inst id Series AZ Message
0 started 172.16.7.254 agrippa bionic default Deployed
Relation provider Requirer Interface Type Message
easyrsa:client kubernetes-
etcd:db flannel:etcd etcd regular
kubeapi-
kubernetes-
kubernetes-
kubernetes-
kubernetes-
---
Whether the consuming model is destroyed completely, or even if the relations are are removed by hand, the offering model still shows the offers being consumed and will not destroy. It eventually times out.
---
ubuntu@
ubuntu@
ubuntu@
ubuntu@
$ juju status --relations
Model Controller Cloud/Region Version SLA Timestamp
k8s-worker jhillman-maas jhillman-maas 2.7.5 unsupported 14:30:54-04:00
SAAS Status Store URL
easyrsa active openstack-regionone admin/default.
etcd active openstack-regionone admin/default.etcd
kubeapi-
kubernetes-master active openstack-regionone admin/default.
App Version Status Scale Charm Store Rev OS Notes
containerd active 1 containerd jujucharms 61 ubuntu
flannel 0.11.0 blocked 1 flannel jujucharms 468 ubuntu
kubernetes-
Unit Workload Agent Machine Public address Ports Message
kubernetes-
containerd/0* active idle 172.16.7.254 Container runtime available
flannel/0* blocked idle 172.16.7.254 Waiting for etcd relation.
Machine State DNS Inst id Series AZ Message
0 started 172.16.7.254 agrippa bionic default Deployed
Relation provider Requirer Interface Type Message
kubernetes-
kubernetes-
kubernetes-
---
Even after removing saas cleanly:
---
ubuntu@
ubuntu@
ubuntu@
ubuntu@
ubuntu@
Model Controller Cloud/Region Version SLA Timestamp
k8s-worker jhillman-maas jhillman-maas 2.7.5 unsupported 14:31:37-04:00
App Version Status Scale Charm Store Rev OS Notes
containerd active 1 containerd jujucharms 61 ubuntu
flannel 0.11.0 blocked 1 flannel jujucharms 468 ubuntu
kubernetes-
Unit Workload Agent Machine Public address Ports Message
kubernetes-
containerd/0* active idle 172.16.7.254 Container runtime available
flannel/0* blocked idle 172.16.7.254 Waiting for etcd relation.
Machine State DNS Inst id Series AZ Message
0 started 172.16.7.254 agrippa bionic default Deployed
---
Some offers still show as joined. This isn't consistent as to which ones stay. It is different each time:
---
$ juju offers
Offer User Relation id Status Endpoint Interface Role Ingress subnets
easyrsa admin 24 joined client tls-certificates provider 172.16.7.0/24
etcd admin 23 joined db etcd provider 172.16.7.0/24
kubeapi-
kubernetes-master -
---
This causes the offering model to not cleanly be removed. In fact, the controller has to be manually removed and re-bootstrapped to resolve this.
description: | updated |
description: | updated |
Changed in juju: | |
assignee: | nobody → Ian Booth (wallyworld) |
status: | New → Triaged |
importance: | Undecided → Medium |
summary: |
- [2.7.5] Stale CMR offers causing model to not destroy properly + Stale CMR offers causing model to not destroy properly |
Changed in juju: | |
milestone: | 2.8.1 → 2.8-next |
Changed in juju: | |
status: | Fix Committed → Fix Released |
It's interesting that the cross model relations join ok but won't get removed. That fact that they join ok suggests it's not an issue with communication between controllers.
It would be interesting to see if --force applied to remove-saas and/or destroy-model and/or remove-relation causes cleanup to complete. The --force option will eventually cleanup the model on which it is run even if the other side is not responding.
Before doing that though, we need more info to diagnose the issue. We need logs fpr both models and their controller controller, after turning on extra debug logging on both models:
juju model-config logging- config= "juju.apiserver .common. crossmodel= DEBUG;juju. apiserver. crossmodelrelat ions=DEBUG; juju.worker. uniter. remotestate= DEBUG;< root>=INFO; UNIT=DEBUG; "
Turn on the logging, set up and deployment, and then remove relation etc. We need to see what cmr related events get published between the models to understand why things are getting stuck.