state: availability zone upgrade fails if containers are present
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | juju-core |
High
|
Jesse Meek | ||
| | 1.22 |
Critical
|
Jesse Meek | ||
| | 1.24 |
High
|
Jesse Meek | ||
Bug Description
Reported by Joshua Randall:
-------
I ran `juju upgrade-juju` today to upgrade a MAAS environment to Juju version 1.22.0 and now `juju status` says that the upgrade has failed (and thus we have limited access to the state server since an upgrade is in progress). What can I do to manually complete the upgrade?
`juju status` shows the following error:
> environment: maas
> machines:
> "0":
> agent-state: error
> agent-state-info: 'upgrade to 1.22.0 failed (giving up): set AvailZone in instanceData:
> no instances found'
> agent-version: 1.22.0
> ...
While '/var/log/
> 2015-04-08 00:03:16 DEBUG juju.provider.maas environprovider
> 2015-04-08 00:03:16 ERROR juju.upgrade upgrade.go:134 upgrade step "set AvailZone in instanceData" failed: no instances found
> 2015-04-08 00:03:16 ERROR juju.cmd.jujud upgrade.go:360 upgrade from 1.21.1 to 1.22.0 for "machine-0" failed (will retry): set AvailZone in instanceData: no instances found
> 2015-04-08 00:03:16 DEBUG juju.apiserver apiserver.go:265 <- [3] machine-0 {"RequestId"
> 2015-04-08 00:03:16 DEBUG juju.apiserver apiserver.go:272 -> [3] machine-0 410.699us {"RequestId"
> 2015-04-08 00:03:16 DEBUG juju.apiserver apiserver.go:265 <- [3] machine-0 {"RequestId"
> error",
-------
| Andrew Wilkins (axwalk) wrote : | #1 |
| Joshua Randall (jcrandall) wrote : | #2 |
I had this issue and have used the workaround suggested on the mailing list (some manual mongodb surgery to add the availzone fields).
For my case (only one MAAS availability zone called "default"), I was able to do the following to get the bootstrap agent upgraded.
$ juju ssh 0
$ sudo apt-get install mongodb-clients
$ sudo -i
$ mongo --ssl -u admin -p $(grep oldpassword /var/lib/
db = db.getSiblingDB
db.instanceData
db.instanceData
Unfortunately I've also run afoul of bug 1416928 (https:/
| Andrew Wilkins (axwalk) wrote : | #3 |
Joshua, just wanted to say thank you very much for providing the steps to work around the issue.
| Alexander List (alexlist) wrote : | #4 |
I tried to use the workaround as well, and retrying the upgrade didn't change things. Bouncing jujud on machine 0 did tho. This may just have been a delay in updating juju status tho.
| Changed in juju-core: | |
| milestone: | none → 1.24-alpha1 |
| Joshua Randall (jcrandall) wrote : | #5 |
Alexander, in fact I also restarted juju on machine 0 (`juju ssh 0 service jujud-machine-0 restart`) after I made the change, and that did force it to retry the upgrade immediately. My suspicion is that it would have eventually done that itself, as I think it had been periodically retrying the upgrade on its own, but I probably should have mentioned that above. Apologies if that was confusing.
| Changed in juju-core: | |
| milestone: | 1.24-alpha1 → 1.24.0 |
| Changed in juju-core: | |
| milestone: | 1.24.0 → 1.25.0 |
| Changed in juju-core: | |
| assignee: | nobody → Jesse Meek (waigani) |
| status: | Triaged → In Progress |
| Changed in juju-core: | |
| status: | In Progress → Fix Committed |
| tags: | added: canonical-bootstack |
| Changed in juju-core: | |
| status: | Fix Committed → Fix Released |


The upgrade code iterates through all instances in state, and adds an availzone field if one doesn't exist. There's two problems:
- it attempts to do this for containers; it should only consider environ-level machines
- it bails if any of the instances cannot be found; I think we should ignore not-found instances, in case they were removed OOB