juju upgrade from 2.3.7 to 2.3.8 failed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned |
Bug Description
Hi,
I tried upgrading a big juju controller from 2.3.7 to 2.3.8 today, and it failed. The controllers are HA-enabled (3 machines). It looks like it took machine 1 around 10 min to notice that there was an update, and 16 minutes to start on 2.3.8. After what it took ~43 min to start the upgrade steps.
Logs of machine 0 (OK) : https:/
Logs of machine 1 (FAIL) : https:/
Logs of machine 2 (OK) : https:/
This also led to the creation of weird documents in the upgradeInfo collection :
juju:PRIMARY> db.upgradeInfo.
{
"_id" : "ObjectIdHex(
"status" : "aborted",
"started" : ISODate(
"2"
],
"txn-revno" : NumberLong(2),
"txn-queue" : [ ]
}
{
"_id" : "current",
"status" : "pending",
"started" : ISODate(
"1"
],
"txn-revno" : NumberLong(5),
"txn-queue" : [ ]
}
All this generated a lot of churn on the mongodb server (simple requests taking 5 to 10 seconds), which led to overall slowness to interact with this controller.
Thanks
I wonder if it is related to bug # 1778614...
Could you please give us more information about the compositions of the controllers? i.e. what was deployed on the controller model, what relations, any subordinates?...
We'd need to know more to reproduce as I am sure that we do test some upgrade scenarios.