state: availability zone upgrade fails if containers are present

Bug #1441478 reported by Andrew Wilkins on 2015-04-08
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
juju-core
High
Jesse Meek
1.22
Critical
Jesse Meek
1.24
High
Jesse Meek

Bug Description

Reported by Joshua Randall:
---------------------------------------------

I ran `juju upgrade-juju` today to upgrade a MAAS environment to Juju version 1.22.0 and now `juju status` says that the upgrade has failed (and thus we have limited access to the state server since an upgrade is in progress). What can I do to manually complete the upgrade?

`juju status` shows the following error:
> environment: maas
> machines:
> "0":
> agent-state: error
> agent-state-info: 'upgrade to 1.22.0 failed (giving up): set AvailZone in instanceData:
> no instances found'
> agent-version: 1.22.0
> ...

While '/var/log/juju/machine-0.log’ on machine 0 has:
> 2015-04-08 00:03:16 DEBUG juju.provider.maas environprovider.go:32 opening environment "maas".
> 2015-04-08 00:03:16 ERROR juju.upgrade upgrade.go:134 upgrade step "set AvailZone in instanceData" failed: no instances found
> 2015-04-08 00:03:16 ERROR juju.cmd.jujud upgrade.go:360 upgrade from 1.21.1 to 1.22.0 for "machine-0" failed (will retry): set AvailZone in instanceData: no instances found
> 2015-04-08 00:03:16 DEBUG juju.apiserver apiserver.go:265 <- [3] machine-0 {"RequestId":22,"Type":"Machiner","Request":"Life","Params":{"Entities":[{"Tag":"machine-0"}]}}
> 2015-04-08 00:03:16 DEBUG juju.apiserver apiserver.go:272 -> [3] machine-0 410.699us {"RequestId":22,"Response":{"Results":[{"Life":"alive","Error":null}]}} Machiner[""].Life
> 2015-04-08 00:03:16 DEBUG juju.apiserver apiserver.go:265 <- [3] machine-0 {"RequestId":23,"Type":"Machiner","Request":"SetStatus","Params":{"Entities":[{"Tag":"machine-0","Status":"
> error","Info":"upgrade to 1.22.0 failed (will retry): set AvailZone in instanceData: no instances found","Data":null}]}}

---------------------------------------------

Andrew Wilkins (axwalk) wrote :

The upgrade code iterates through all instances in state, and adds an availzone field if one doesn't exist. There's two problems:
 - it attempts to do this for containers; it should only consider environ-level machines
 - it bails if any of the instances cannot be found; I think we should ignore not-found instances, in case they were removed OOB

Joshua Randall (jcrandall) wrote :

I had this issue and have used the workaround suggested on the mailing list (some manual mongodb surgery to add the availzone fields).

For my case (only one MAAS availability zone called "default"), I was able to do the following to get the bootstrap agent upgraded.

$ juju ssh 0
$ sudo apt-get install mongodb-clients
$ sudo -i
$ mongo --ssl -u admin -p $(grep oldpassword /var/lib/juju/agents/machine-0/agent.conf | awk -e '{print $2}') localhost:37017/admin
db = db.getSiblingDB("juju")
db.instanceData.update({machineid: { $nin: [/lxc/]}}, {$set: {availzone: "default"}}, {multi: true})
db.instanceData.update({machineid: { $in: [/lxc/]}}, {$set: {availzone: ""}}, {multi: true})

Unfortunately I've also run afoul of bug 1416928 (https://bugs.launchpad.net/juju-core/+bug/1416928) so none of my other agents upgraded successfully, but I'm now working on fixing that using the workaround suggested there.

Andrew Wilkins (axwalk) wrote :

Joshua, just wanted to say thank you very much for providing the steps to work around the issue.

Alexander List (alexlist) wrote :

I tried to use the workaround as well, and retrying the upgrade didn't change things. Bouncing jujud on machine 0 did tho. This may just have been a delay in updating juju status tho.

Curtis Hovey (sinzui) on 2015-04-10
Changed in juju-core:
milestone: none → 1.24-alpha1
Joshua Randall (jcrandall) wrote :

Alexander, in fact I also restarted juju on machine 0 (`juju ssh 0 service jujud-machine-0 restart`) after I made the change, and that did force it to retry the upgrade immediately. My suspicion is that it would have eventually done that itself, as I think it had been periodically retrying the upgrade on its own, but I probably should have mentioned that above. Apologies if that was confusing.

Curtis Hovey (sinzui) on 2015-04-27
Changed in juju-core:
milestone: 1.24-alpha1 → 1.24.0
Curtis Hovey (sinzui) on 2015-04-27
Changed in juju-core:
milestone: 1.24.0 → 1.25.0
Jesse Meek (waigani) on 2015-06-05
Changed in juju-core:
assignee: nobody → Jesse Meek (waigani)
status: Triaged → In Progress
Jesse Meek (waigani) on 2015-06-07
Changed in juju-core:
status: In Progress → Fix Committed
tags: added: canonical-bootstack
Curtis Hovey (sinzui) on 2015-06-11
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers