Upgrade-juju is broken on most/all substrates

Bug #1430791 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju-core
Fix Released
Critical
Jesse Meek

Bug Description

These upgrades tests failed. In many cases, envs were not destroyed, which causes a resource shortage for other tests:

aws-upgrade-precise-amd64 build #2449 http://juju-ci.vapour.ws:8080/job/aws-upgrade-precise-amd64/2449/console
aws-upgrade-trusty-amd64 build #1788 http://juju-ci.vapour.ws:8080/job/aws-upgrade-trusty-amd64/1788/console
hp-upgrade-trusty-amd64 build #2648 http://juju-ci.vapour.ws:8080/job/hp-upgrade-trusty-amd64/2648/console
kvm-upgrade-trusty-amd64 build #569 http://juju-ci.vapour.ws:8080/job/kvm-upgrade-trusty-amd64/569/console
local-upgrade-trusty-amd64 build #919 http://juju-ci.vapour.ws:8080/job/local-upgrade-trusty-amd64/919/console
maas-1_7-upgrade-trusty-amd64 build #426 http://juju-ci.vapour.ws:8080/job/maas-1_7-upgrade-trusty-amd64/426/console
maas-1_8-upgrade-trusty-amd64 build #139 http://juju-ci.vapour.ws:8080/job/maas-1_8-upgrade-trusty-amd64/139/console
maas-upgrade-trusty-amd64 build #612 http://juju-ci.vapour.ws:8080/job/maas-upgrade-trusty-amd64/612/console
manual-upgrade-precise-amd64 build #1185 http://juju-ci.vapour.ws:8080/job/manual-upgrade-precise-amd64/1185/console

Curtis Hovey (sinzui)
summary: - Juju Upgrade is broken on most/all substrates
+ Upgrade-juju is broken on most/all substrates
Revision history for this message
Curtis Hovey (sinzui) wrote :

We have logs from the aws-upgrade-trusty-amd64 env that was left running. We can see

2015-03-11 08:41:50 INFO juju.worker.upgrader upgrader.go:152 upgrade requested from 1.21.3-trusty-amd64 to 1.23-alpha1
...
<panics start after 2015-03-11 08:42:04>
...
2015-03-11 08:48:14 INFO juju.cmd.jujud upgrade.go:329 starting upgrade from 1.21.3 to 1.23-alpha1 for "machine-0"
...
2015-03-11 08:48:15 INFO juju.cmd.jujud upgrade.go:199 upgrade to 1.23-alpha1 completed successfully.

The upgrade of the state-server did happen in about 7 minutes.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Attached is the all-machines.log with upgrade information about the deployed services.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Looking at the logs there are at least 2 issues:
1) TLS cert gets regenerated after upgrade, which is a bit surprising but probably expected. What's not expected is the panic while this happens.
2) After machine 0 gets upgraded and restarts, as other machines and units try to reconnect they try calling Login with a tag which the upgraded state server deems invalid and panics (due to missing envUUID most likely). This is definitely wrong - the apiserver should be more resilient towards older (without envUUID) clients trying to connect as well as against random DoS-type attacks with invalid credentials/tags.
Another reason for the second panic could be a missing/not-yet-run upgrade step perhaps?

Jesse Meek (waigani)
Changed in juju-core:
assignee: nobody → Jesse Meek (waigani)
status: Triaged → In Progress
Jesse Meek (waigani)
Changed in juju-core:
status: In Progress → Fix Committed
Tim Penhey (thumper)
Changed in juju-core:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.