We've experienced an issue with the consistency and stability of our Juju controllers, and are struggling to pinpoint what's actually happening.
We're operating a HA controller set, running Juju 2.9.42, deployed in an Openstack cloud.
Symptoms we've observed have been:
* Issues with the stability of relationship hooks in deployed models (we have observed issues with relationships being created, updated, and departed)
* Controllers returning inconsistent "juju status" results
When running "juju status --debug" to make sure we get one result from each controller, we have observed that at least one controller will consistently return a different result than the other(s).
For example, this paste shows both secondary controllers reporting the primary controller as "agent-lost", while the primary disagrees: https://pastebin.ubuntu.com/p/VHrwFbm79Z/
Hi,
We've experienced an issue with the consistency and stability of our Juju controllers, and are struggling to pinpoint what's actually happening.
We're operating a HA controller set, running Juju 2.9.42, deployed in an Openstack cloud.
Symptoms we've observed have been:
* Issues with the stability of relationship hooks in deployed models (we have observed issues with relationships being created, updated, and departed)
* Controllers returning inconsistent "juju status" results /pastebin. ubuntu. com/p/VHrwFbm79 Z/
When running "juju status --debug" to make sure we get one result from each controller, we have observed that at least one controller will consistently return a different result than the other(s).
For example, this paste shows both secondary controllers reporting the primary controller as "agent-lost", while the primary disagrees: https:/
Controller logs from the period in question have been made available via secure portal https:/ /juju-controlle r-reports. admin.canonical .com/ps5- prodstack- is/
Model logs for the specific model in which we observed relationship hook issues are located in "special-request" under that directory.
Please advise if there are any additional logs we should supply, any metrics we can gather from the time, or anything else.
Thanks!