Comment 11 for bug 1765722

Revision history for this message
John A Meinel (jameinel) wrote :

At the moment, if you try to upgrade to 2.3.6 you'll end up with a Transaction that cannot be applied, which makes it very hard to get out of this situation. (transactions don't naturally go to aborted if they can't be applied, only if the assertions fail, because they don't actually have enough information to implement rollback)

These are the steps I did to reproduce and recover:

$ juju bootstrap lxd --debug --agent-version=2.3.5
$ juju upgrade-juju --build-agent #using a source tree at juju-2.3.6
$ juju status
# shows that machine 0 is in an error state
# reading /var/log/juju/machine-0.log shows
2018-04-23 05:33:28 INFO juju.upgrade upgrade.go:138 running upgrade step: ensure container-image-stream config defaults to released
2018-04-23 05:33:29 ERROR juju.upgrade upgrade.go:140 upgrade step "ensure container-image-stream config defaults to released" failed: The dotted field 'zfs.pool_name' in 'settings.zfs.pool_name' is not valid for storage.
2018-04-23 05:33:29 ERROR juju.worker.upgradesteps worker.go:379 upgrade from 2.3.5 to 2.3.6.1 for "machine-0" failed (will retry): ensure container-image-stream config defaults to released: The dotted field 'zfs.pool_name' in 'settings.zfs.pool_name' is not valid for storage.

# get mgopurge and juju-force-upgrade onto the system
juju scp -m controller mgopurge juju-force-upgrade 0:.
# stop the agent
$ juju ssh -m controller 0
$$ sudo su -
$$ systemctl stop jujud-machine-0
# get the right credentials to connect to mongo
$$ agent=$(cd /var/lib/juju/agents; echo machine-*)
$$ pw=$(sudo grep statepassword /var/lib/juju/agents/${agent}/agent.conf | cut '-d ' -sf2)
$$ /usr/lib/juju/mongo3.2/bin/mongo --ssl -u ${agent} -p $pw --authenticationDatabase admin --sslAllowInvalidHostnames --sslAllowInvalidCertificates localhost:37017/juju

# state 2 is prepared, state 4 is 'applying' which is the one we're currently broken on
> db.txns.find({"o.c": "settings", "s": {"$in": [2, 4]}}, {"_id": 1}).pretty()
{ "_id" : ObjectId("5add702921919b059b1a2c36") }
{ "_id" : ObjectId("5add70a121919b059b1a2c3c") }
{ "_id" : ObjectId("5add711921919b059b1a2c42") }
{ "_id" : ObjectId("5add719121919b059b1a2c48") }
{ "_id" : ObjectId("5add720b21919b06bc12210b") }
{ "_id" : ObjectId("5add728321919b06bc122111") }

# forcibly remove these transactions
> db.txns.remove({"o.c": "settings", "s": {"$in": [2, 4]}})
WriteResult({ "nRemoved" : 6 })

# fix up the transactions in the db using mgopurge
$$ ~ubuntu/mgopurge -username ${agent} -password ${pw} -ssl

# find the controller UUID. Could also have been "juju models --uuid"
> db.models.find({"name": "controller"}, {"_id": 1})
{ "_id" : "bcc296d8-d6e5-427c-8339-d5560848e6f5" }

# Remove the "upgrade in progress" document
> db.upgradeInfo.remove({"_id": "current"})

# update the desired agent version
$$ ~ubuntu/juju-force-upgrade "bcc296d8-d6e5-427c-8339-d5560848e6f5" 2.3.5

# restart the controller
$$ systemctl start jujud-machine-0

# check that we're properly back to 2.3.5:
$ juju status
Model Controller Cloud/Region Version Notes SLA
controller lxd lxd 2.3.5 upgrade available: 2.3.6 unsupported

$$ grep "running jujud" /var/log/juju/machine-0.log
...
2018-04-23 06:04:51 INFO juju.cmd supercommand.go:56 running jujud [2.3.6.1 gc go1.10.1]
2018-04-23 06:04:56 INFO juju.cmd supercommand.go:56 running jujud [2.3.5 gc go1.10]

# grab the latest 2.3 and upgrade to that
# juju upgrade-juju --debug -m controller
10:09:46 INFO juju.cmd supercommand.go:56 running juju [2.3.7 gc go1.10.1]
...

$ juju status
Model Controller Cloud/Region Version SLA
controller lxd lxd 2.3.7.1 unsupported
...