juju fails to upgrade ha controllers on for (at least) lxd controllers
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
Undecided
|
Jack Shaw |
Bug Description
Upgrading juju 2.9.25 to 2.9.26, observed only in lxd controllers, but may also effect other machine controllers
Steps to reproduce:
1) Bootstrap an lxd controller and enable ha. (It is currently unknown if this effects other providers or not)
2) Upgrade your controllers (optionally with --build-agent)
Expected result: controllers upgrade to the new version and I am able to proceed with normal operations
Observed result: `juju status` and other juju commands hang forever. Upgrade is not performed
Note, I have been able to reproduce this inside a multipass VM, and (both x86 and arm based) AWS Ubuntu instances. But not using lxd on my Ubuntu desktop
For further clarity in the logs, I have made a small patch to the Juju code base prior to building juju and the agent for upgrades, as follows:
```
diff --git a/mongo/mongo.go b/mongo/mongo.go
index 146d8664e9.
--- a/mongo/mongo.go
+++ b/mongo/mongo.go
@@ -260,6 +260,7 @@ func isMaster(session *mgo.Session, obj WithAddresses) (bool, error) {
addrs := obj.Addresses()
+ logger.
// If the replica set has not been configured, then we
// can have only one master and the caller must
@@ -272,11 +273,15 @@ func isMaster(session *mgo.Session, obj WithAddresses) (bool, error) {
}
masterAddr, _, err := net.SplitHostPo
+ logger.
if err != nil {
}
+ logger.
for _, addr := range addrs {
+ logger.
+ logger.
if addr.Value == masterAddr {
}
diff --git a/worker/
index 35e27fec21.
--- a/worker/
+++ b/worker/
@@ -184,6 +184,7 @@ func (w *upgradeDB) run() error {
// If we are the primary we need to run the upgrade steps.
// Otherwise we watch state and unlock once the primary has run the steps.
+ w.logger.Infof("Is Primary? %t", isPrimary)
if isPrimary {
} else {
```
Important information of note (particularly IP addresses of controllers)
```
$ juju status -m controller
Model Controller Cloud/Region Version SLA Timestamp
controller c localhost/localhost 2.9.25 unsupported 14:00:13Z
Machine State DNS Inst id Series AZ Message
0 started 10.73.49.98 juju-371cb8-0 focal Running
1 started 10.73.49.251 juju-371cb8-1 focal Running
2 started 10.73.49.189 juju-371cb8-2 focal Running
```
And the resulting logs in each machine controller
machine 0:
INFO juju.mongo mongo.go:263 Master HostPort: 10.73.49.189:37017
INFO juju.mongo mongo.go:276 Master Address: 10.73.49.189
INFO juju.mongo mongo.go:281 Checking master host against 3 addrs
INFO juju.mongo mongo.go:283 Checking against local-cloud:
INFO juju.mongo mongo.go:284 Result: false
INFO juju.mongo mongo.go:283 Checking against local-machine:
INFO juju.mongo mongo.go:284 Result: false
INFO juju.mongo mongo.go:283 Checking against local-machine:::1
INFO juju.mongo mongo.go:284 Result: false
INFO juju.worker.
INFO juju.worker.
ERROR juju.worker.
ERROR juju.worker.
ERROR juju.worker.
ERROR juju.worker.
ERROR juju.worker.
Machine 1:
INFO juju.mongo mongo.go:263 Master HostPort: 10.73.49.189:37017
INFO juju.mongo mongo.go:276 Master Address: 10.73.49.189
INFO juju.mongo mongo.go:281 Checking master host against 3 addrs
INFO juju.mongo mongo.go:283 Checking against local-cloud:
INFO juju.mongo mongo.go:284 Result: false
INFO juju.mongo mongo.go:283 Checking against local-machine:
INFO juju.mongo mongo.go:284 Result: false
INFO juju.mongo mongo.go:283 Checking against local-machine:::1
INFO juju.mongo mongo.go:284 Result: false
INFO juju.worker.
INFO juju.worker.
Machine 2:
INFO juju.mongo mongo.go:263 Master HostPort: 10.73.49.98:37017
INFO juju.mongo mongo.go:276 Master Address: 10.73.49.98
INFO juju.mongo mongo.go:281 Checking master host against 3 addrs
INFO juju.mongo mongo.go:283 Checking against local-cloud:
INFO juju.mongo mongo.go:284 Result: false
INFO juju.mongo mongo.go:283 Checking against local-machine:
INFO juju.mongo mongo.go:284 Result: false
INFO juju.mongo mongo.go:283 Checking against local-machine:::1
INFO juju.mongo mongo.go:284 Result: false
INFO juju.worker.
INFO juju.worker.
Clearly what has happened is none of the controllers believe they are mongo primaries, so they all begin waiting for the primary to perform the db upgrade, which of course never happens leading to a stalemate
It seems this is cause by replicaset.
Changed in juju: | |
milestone: | none → 2.9.27 |
Changed in juju: | |
status: | In Progress → Fix Committed |
Changed in juju: | |
status: | Fix Committed → Fix Released |
I just encountered this issue! My HA controllers (2) are KVM VMs and I was trying to upgrade from 2.9.22 > 2.9.25.
Did you manage to find a solution / fix?