Deployer fails because juju thinks it is upgrading
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
juju-ci-tools |
Fix Released
|
Critical
|
Curtis Hovey | ||
juju-core |
Fix Released
|
High
|
Ian Booth | ||
python-jujuclient |
Fix Released
|
High
|
Ian Booth |
Bug Description
The maas deoloyer job is failing. Deployer is blocked/
http://
Juju cannot be upgrading, or at least, it is not possible to upgrade in this case because the version in test is 1.24-beta6, which is the newest version in the test streams. We see a download of
https:/
and their is no greater version in the streams created and confirmed in the logs output.
This regression may be caused by...which would be ironic
Commit 2b71c0d Merge pull request #2441 from wallyworld/
Related branches
- Adam Collard (community): Needs Fixing
- Kapil Thangavelu: Approve
- John A Meinel (community): Approve
-
Diff: 101 lines (+47/-8)2 files modifiedjujuclient.py (+27/-7)
test_jujuclient.py (+20/-1)
Changed in python-jujuclient: | |
status: | In Progress → Fix Committed |
Changed in python-jujuclient: | |
importance: | Undecided → High |
Changed in juju-core: | |
importance: | Critical → High |
tags: | added: tech-debt |
Changed in juju-core: | |
status: | Fix Committed → Fix Released |
Changed in python-jujuclient: | |
status: | Fix Committed → Fix Released |
The term "upgrade" may mean one of two things:
1. Juju running upgrade steps to upgrade an older environment
2. Juju upgrading agent tools
The message in this bug refers to item 1, but with the change in behaviour of Juju bootstrap, is now poorly worded.
What happens now is:
1. Juju bootstraps and starts the machine agent on the bootstrap node
2. the machine agent delays activating the until:
a. any upgrade steps are run
b. it has determined that no agent upgrades are needed <-- this is new
3. once all upgrade (agent or steps) related tasks are finished, the full api is enabled
So if a deploy is attempted before the full api is enabled, the "upgrade in progress" error is returned.
Before the above change, the deployer would connect immediately after bootstrap and if an implicit upgrade were done, the deployer would be disconnected part way through it's deployment process.
Now what happens is more correct - any attempt to do work with Juju while the state server is not ready is rejected up front, rather than accepting a connection and then disconnecting.
The same response in this bug would happen if the user typed fast and did a deploy immediately after bootstrap - they would be told to try again in a sort time.
Ideally here the deployer would "do the right thing" and retry.