Comment 10 for bug 1746265

Revision history for this message
Tim Penhey (thumper) wrote :

Hmm... based on the information you were able to provide to me earlier today, it seems that somehow the database got messed up during the upgrade. Unfortunately the logs aren't helpful at showing exactly how this got out of sync.

The database has content that looks as if it has a transaction half applied. Now this shouldn't happen. Clearly.

So... how to get out of this situation...

The only way without redeploying is some database surgery. I would recommend doing a backup first, but since juju is in a half broken state, this isn't easy. What you would need to do is go and shut down each of the controller API servers. SSH into each of the controller machines and do the following:
  sudo service jujud-machine-x stop (where x is the machine id)

Then you'll want to do a mongo dump of the juju database (found in /var/lib/juju/db)

Then get into the database using something similar to https://pastebin.ubuntu.com/26502452/

conf=/var/lib/juju/agents/machine-*/agent.conf
user=`sudo grep tag $conf | cut -d' ' -f2`
password=`sudo grep statepassword $conf | cut -d' ' -f2`
/usr/lib/juju/mongo*/bin/mongo 127.0.0.1:37017/juju --authenticationDatabase admin --ssl --sslAllowInvalidCertificates --username "$user" --password "$password"

Once inside there, you need to execute the following commands:

db.leases.remove({"_id": "ef61dcef-2fb3-4b58-8ec6-2a9a0b2410c3:application-leadership#nova-cloud-controller#"})
db.leases.remove({"_id": "ef61dcef-2fb3-4b58-8ec6-2a9a0b2410c3:application-leadership#cinder-hacluster#"})
db.leases.remove({"_id": "ef61dcef-2fb3-4b58-8ec6-2a9a0b2410c3:application-leadership#neutron-gateway#"})
db.leases.remove({"_id": "ef61dcef-2fb3-4b58-8ec6-2a9a0b2410c3:application-leadership#ceph-osd#"})
db.leases.remove({"_id": "ef61dcef-2fb3-4b58-8ec6-2a9a0b2410c3:application-leadership#ntp#"})

Before the agents are restarted, we are going to want to ensure that they rerun the upgrade steps. The agents determine this by looking in their agent.conf file to see what version they were last running. This file is found in /var/lib/juju/agents/machine-x directory (where x is the machine id). This is a YAML file, and the key you are looking for is "upgradedToVersion". It probably says "2.3.2", but to make it rerun the upgrade steps (which are idempotent) make it say "2.2.9".

Then restart the agents. They should start up, wait for each other, then run the upgrade steps, then progress normally.