At the moment, if you try to upgrade to 2.3.6 you'll end up with a Transaction that cannot be applied, which makes it very hard to get out of this situation. (transactions don't naturally go to aborted if they can't be applied, only if the assertions fail, because they don't actually have enough information to implement rollback)
These are the steps I did to reproduce and recover:
$ juju bootstrap lxd --debug --agent-version=2.3.5
$ juju upgrade-juju --build-agent #using a source tree at juju-2.3.6
$ juju status
# shows that machine 0 is in an error state
# reading /var/log/juju/machine-0.log shows
2018-04-23 05:33:28 INFO juju.upgrade upgrade.go:138 running upgrade step: ensure container-image-stream config defaults to released
2018-04-23 05:33:29 ERROR juju.upgrade upgrade.go:140 upgrade step "ensure container-image-stream config defaults to released" failed: The dotted field 'zfs.pool_name' in 'settings.zfs.pool_name' is not valid for storage.
2018-04-23 05:33:29 ERROR juju.worker.upgradesteps worker.go:379 upgrade from 2.3.5 to 2.3.6.1 for "machine-0" failed (will retry): ensure container-image-stream config defaults to released: The dotted field 'zfs.pool_name' in 'settings.zfs.pool_name' is not valid for storage.
# get mgopurge and juju-force-upgrade onto the system
juju scp -m controller mgopurge juju-force-upgrade 0:.
# stop the agent
$ juju ssh -m controller 0
$$ sudo su -
$$ systemctl stop jujud-machine-0
# get the right credentials to connect to mongo
$$ agent=$(cd /var/lib/juju/agents; echo machine-*)
$$ pw=$(sudo grep statepassword /var/lib/juju/agents/${agent}/agent.conf | cut '-d ' -sf2)
$$ /usr/lib/juju/mongo3.2/bin/mongo --ssl -u ${agent} -p $pw --authenticationDatabase admin --sslAllowInvalidHostnames --sslAllowInvalidCertificates localhost:37017/juju
# state 2 is prepared, state 4 is 'applying' which is the one we're currently broken on
> db.txns.find({"o.c": "settings", "s": {"$in": [2, 4]}}, {"_id": 1}).pretty()
{ "_id" : ObjectId("5add702921919b059b1a2c36") }
{ "_id" : ObjectId("5add70a121919b059b1a2c3c") }
{ "_id" : ObjectId("5add711921919b059b1a2c42") }
{ "_id" : ObjectId("5add719121919b059b1a2c48") }
{ "_id" : ObjectId("5add720b21919b06bc12210b") }
{ "_id" : ObjectId("5add728321919b06bc122111") }
# fix up the transactions in the db using mgopurge
$$ ~ubuntu/mgopurge -username ${agent} -password ${pw} -ssl
# find the controller UUID. Could also have been "juju models --uuid"
> db.models.find({"name": "controller"}, {"_id": 1})
{ "_id" : "bcc296d8-d6e5-427c-8339-d5560848e6f5" }
# Remove the "upgrade in progress" document
> db.upgradeInfo.remove({"_id": "current"})
# update the desired agent version
$$ ~ubuntu/juju-force-upgrade "bcc296d8-d6e5-427c-8339-d5560848e6f5" 2.3.5
# restart the controller
$$ systemctl start jujud-machine-0
# check that we're properly back to 2.3.5:
$ juju status
Model Controller Cloud/Region Version Notes SLA
controller lxd lxd 2.3.5 upgrade available: 2.3.6 unsupported
# grab the latest 2.3 and upgrade to that
# juju upgrade-juju --debug -m controller
10:09:46 INFO juju.cmd supercommand.go:56 running juju [2.3.7 gc go1.10.1]
...
$ juju status
Model Controller Cloud/Region Version SLA
controller lxd lxd 2.3.7.1 unsupported
...
At the moment, if you try to upgrade to 2.3.6 you'll end up with a Transaction that cannot be applied, which makes it very hard to get out of this situation. (transactions don't naturally go to aborted if they can't be applied, only if the assertions fail, because they don't actually have enough information to implement rollback)
These are the steps I did to reproduce and recover:
$ juju bootstrap lxd --debug --agent- version= 2.3.5 juju/machine- 0.log shows image-stream config defaults to released image-stream config defaults to released" failed: The dotted field 'zfs.pool_name' in 'settings. zfs.pool_ name' is not valid for storage. upgradesteps worker.go:379 upgrade from 2.3.5 to 2.3.6.1 for "machine-0" failed (will retry): ensure container- image-stream config defaults to released: The dotted field 'zfs.pool_name' in 'settings. zfs.pool_ name' is not valid for storage.
$ juju upgrade-juju --build-agent #using a source tree at juju-2.3.6
$ juju status
# shows that machine 0 is in an error state
# reading /var/log/
2018-04-23 05:33:28 INFO juju.upgrade upgrade.go:138 running upgrade step: ensure container-
2018-04-23 05:33:29 ERROR juju.upgrade upgrade.go:140 upgrade step "ensure container-
2018-04-23 05:33:29 ERROR juju.worker.
# get mgopurge and juju-force-upgrade onto the system juju/agents; echo machine-*) juju/agents/ ${agent} /agent. conf | cut '-d ' -sf2) juju/mongo3. 2/bin/mongo --ssl -u ${agent} -p $pw --authenticatio nDatabase admin --sslAllowInval idHostnames --sslAllowInval idCertificates localhost: 37017/juju
juju scp -m controller mgopurge juju-force-upgrade 0:.
# stop the agent
$ juju ssh -m controller 0
$$ sudo su -
$$ systemctl stop jujud-machine-0
# get the right credentials to connect to mongo
$$ agent=$(cd /var/lib/
$$ pw=$(sudo grep statepassword /var/lib/
$$ /usr/lib/
# state 2 is prepared, state 4 is 'applying' which is the one we're currently broken on find({" o.c": "settings", "s": {"$in": [2, 4]}}, {"_id": 1}).pretty() "5add702921919b 059b1a2c36" ) } "5add70a121919b 059b1a2c3c" ) } "5add711921919b 059b1a2c42" ) } "5add719121919b 059b1a2c48" ) } "5add720b21919b 06bc12210b" ) } "5add728321919b 06bc122111" ) }
> db.txns.
{ "_id" : ObjectId(
{ "_id" : ObjectId(
{ "_id" : ObjectId(
{ "_id" : ObjectId(
{ "_id" : ObjectId(
{ "_id" : ObjectId(
# forcibly remove these transactions remove( {"o.c": "settings", "s": {"$in": [2, 4]}})
> db.txns.
WriteResult({ "nRemoved" : 6 })
# fix up the transactions in the db using mgopurge
$$ ~ubuntu/mgopurge -username ${agent} -password ${pw} -ssl
# find the controller UUID. Could also have been "juju models --uuid" find({" name": "controller"}, {"_id": 1}) d6e5-427c- 8339-d5560848e6 f5" }
> db.models.
{ "_id" : "bcc296d8-
# Remove the "upgrade in progress" document remove( {"_id": "current"})
> db.upgradeInfo.
# update the desired agent version juju-force- upgrade "bcc296d8- d6e5-427c- 8339-d5560848e6 f5" 2.3.5
$$ ~ubuntu/
# restart the controller
$$ systemctl start jujud-machine-0
# check that we're properly back to 2.3.5:
$ juju status
Model Controller Cloud/Region Version Notes SLA
controller lxd lxd 2.3.5 upgrade available: 2.3.6 unsupported
$$ grep "running jujud" /var/log/ juju/machine- 0.log
...
2018-04-23 06:04:51 INFO juju.cmd supercommand.go:56 running jujud [2.3.6.1 gc go1.10.1]
2018-04-23 06:04:56 INFO juju.cmd supercommand.go:56 running jujud [2.3.5 gc go1.10]
# grab the latest 2.3 and upgrade to that
# juju upgrade-juju --debug -m controller
10:09:46 INFO juju.cmd supercommand.go:56 running juju [2.3.7 gc go1.10.1]
...
$ juju status
Model Controller Cloud/Region Version SLA
controller lxd lxd 2.3.7.1 unsupported
...