Failed to upgrade from 1.22.1 to 1.22.3 during deployment

Bug #1452680 reported by Alberto Donato
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
juju-core
Incomplete
High
Cheryl Jennings

Bug Description

During a Landscape Autopilot deployment juju got stuck with agents trying to upgrade to 1.22.3 (the deploy was being performed with 1.22.1).

Last lines from juju debug-log report agents requesting a restart:

machine-0: 2015-05-07 10:04:57 DEBUG juju.worker.logger logger.go:45 reconfiguring logging from "<root>=DEBUG" to "<root>=WARNING;unit=DEBUG"
machine-0: 2015-05-07 10:04:58 ERROR juju.worker runner.go:219 exited "firewaller": machine 3 not provisioned
machine-0: 2015-05-07 10:05:01 ERROR juju.worker runner.go:219 exited "firewaller": machine 3 not provisioned
machine-1[21776]: 2015-05-07 10:12:34 ERROR juju.worker runner.go:208 fatal "upgrader": must restart: an agent upgrade is available
machine-1[21776]: 2015-05-07 10:12:34 ERROR juju.worker runner.go:208 fatal "api": must restart: an agent upgrade is available
machine-1[21776]: 2015-05-07 10:12:34 ERROR juju.cmd supercommand.go:430 must restart: an agent upgrade is available
machine-2[21793]: 2015-05-07 10:12:54 ERROR juju.worker runner.go:208 fatal "upgrader": must restart: an agent upgrade is available
machine-2[21793]: 2015-05-07 10:12:54 ERROR juju.worker runner.go:208 fatal "api": must restart: an agent upgrade is available
machine-2[21793]: 2015-05-07 10:12:54 ERROR juju.cmd supercommand.go:430 must restart: an agent upgrade is available

Revision history for this message
Alberto Donato (ack) wrote :
Revision history for this message
Alberto Donato (ack) wrote :
tags: added: landscape
tags: added: cloud-installer
Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 1.25.0
tags: added: upgrade-juju
Revision history for this message
Cheryl Jennings (cherylj) wrote :

Confirmed that this was on trusty. Still looking...

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Juju being deployed was 1.22.1-0ubuntu1~14.04.1~juju1 from the landscape PPA, which was copied from the juju stable packages PPA

Revision history for this message
Curtis Hovey (sinzui) wrote :

Where did 1.22.1 come from. the Juju stable PPA before it was superseded? From Ubuntu trusty proposed?

I cannot reproduce this case using both the certification tests and manually (6 separate attempts).
A. When I bootstrap with juju 1.22.1 and deploy a stack. the agent versions are 1.22.1
B. If I ask for an upgrade, Juju correctly selects 1.23.2 (the current stable) and upgrades complete.

Is an upgrade to 1.22.3 being explicitly requested?
    juju upgrade-juju --version=1.22.3

Is the env using private streams that are missing the current stable?

Changed in juju-core:
status: Triaged → Incomplete
Revision history for this message
Adam Collard (adam-collard) wrote : Re: [Bug 1452680] Re: Failed to upgrade from 1.22.1 to 1.22.3 during deployment

On 7 May 2015 at 17:34, Curtis Hovey <email address hidden> wrote:

> Where did 1.22.1 come from. the Juju stable PPA before it was
> superseded? From Ubuntu trusty proposed?
>

Juju stable PPA (before it was superseded) we copy the package so that we
pin the version of Juju that we use.

> I cannot reproduce this case using both the certification tests and
> manually (6 separate attempts).
> A. When I bootstrap with juju 1.22.1 and deploy a stack. the agent
> versions are 1.22.1
>
B. If I ask for an upgrade, Juju correctly selects 1.23.2 (the current
> stable) and upgrades complete.
>
> Is an upgrade to 1.22.3 being explicitly requested?
> juju upgrade-juju --version=1.22.3
>

No, just bootstrap.

> Is the env using private streams that are missing the current stable?
>

No, just the default (streams.canonical.com ?), nothing pointing at
proposed or devel or the like.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Opps. My personal tests were tainted. I was using devel steams.

When I bootstrap with 1.22.1, I do see juju automatically upgrade to 1.22.3. My deployments are *never* 1.22.1, they start as 1.22.3. everything works.

Maybe there is a timing issue where juju is allowing deployments when it knows it is going to upgrade.
/me tries to instrument this case.

Revision history for this message
Adam Collard (adam-collard) wrote :

I can reproduce the agent being upgraded quite trivially

⟫ juju version
1.22.1-vivid-amd64
⟫ juju bootstrap --to my.machine
...
⟫ juju status --format=tabular
[Machines]
ID STATE VERSION DNS INS-ID SERIES HARDWARE
0 started 1.22.3 my.machine /MAAS/api/1.0/nodes/node-36b4d226-4b4a-11e4-a091-a0b3cce4ecca/ trusty arch=amd64 cpu-cores=4 mem=16384M

Revision history for this message
Curtis Hovey (sinzui) wrote :

I cannot reproduce this with by pasting these command to start deploys before upgrades start:
    juju bootstrap
    juju deploy -n 2 ubuntu
    juju status

I do see the state-server are 1.22.1. I see the other machines and units come up. I see them all upgrade to 1.22.3. the env is usable. In a second try, I saw the second ubuntu machine and unit in pending and allocating then arrive a 1.22.3. I never saw it as 1.22.1. The env was usable.

This is interesting from IRC:
sparkiegeek> sinzui: looking at the log a bit closer it looks like it falls over when add-machine'ing

I will try
    juju bootstrap
    juju add-machine -n 2
    juju status

Revision history for this message
Curtis Hovey (sinzui) wrote :

Again I cannot make juju fail. pasteing:
    juju bootstrap
    juju add-machine -n 2
    juju status

I can instrument deployments before the 1.22.1 upgrade to 1.22.3. Everything does upgrade. the env is usable. There is about an extra 1 minute delay until is see everything at 1.22.3

Could a specific charm be changing networking on a machine to prevent an upgrade? If so we have a larger problem than micro-versions of juju.

Starting with 1.21.x+, the rules where agents are downloaded from changed. First the state-server upgrades using the agents found in streams. Once it completes, I tells each machine and unit to upgrade using the cached agent on the state-server. Agents are not downloaded form streams, but from the state server that is assumed every unit can talk to. Each unit can call home, so each unit should be able to download an agent.

Revision history for this message
Curtis Hovey (sinzui) wrote :

This also just works
    juju bootstrap --to ubuntu@165.225.131.241
    juju deploy -n 2 ubuntu
    juju status

I think we need to know more about the network or the charms being deployed.

Curtis Hovey (sinzui)
Changed in juju-core:
importance: Critical → High
no longer affects: juju-core/1.24
no longer affects: juju-core/1.23
no longer affects: juju-core/1.22
Revision history for this message
David Britton (dpb) wrote :

Much like:

https://bugs.launchpad.net/juju-core/+bug/1247232

setting "agent-version: 1.22.1" seems to workaround *whatever* bug we are hitting.

I think the best way to proceed forward will be for someone with more juju-debugging chops (like juju-qa) to use the autopilot where this issue is reproducible.

In the autopilot terms, it's simple. When I deploy with 10 or greater machines, I hit the bug.

Revision history for this message
Curtis Hovey (sinzui) wrote :

While we cannot reproduce this bug without autopilot, David and I discussed the deeper issue where enterprises parties working with certification need to control the exact version juju selects for the state-server. Juju CI does this by adding an option to env in environments.yaml like so:

    agent-version: 1.22.1

This technique is not a hack. Any party or tool can set this to ensure repeatability of a process. We may want to publicise this technique more widely. In the case of tools like OIL, autopilot, and Juju CI, this ensures unwanted upgrades are avoided.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

It looks like the all-machines.log might be incomplete as it stops right as the agents are restarted with the new version. Could you collect the machine-N.log files from each host?

Changed in juju-core:
assignee: nobody → Cheryl Jennings (cherylj)
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I was wondering if --upload-tools would also lock the agent version to whatever was uploaded.

Here are some tests where I used juju 1.22.1 locally:

- with --upload-tools: https://pastebin.canonical.com/131040/
- without: https://pastebin.canonical.com/131041/

When I did NOT use --upload-tools, the agent was quickly upgraded from 1.22.1 to 1.22.3. When upload-tools was used, it stayed put at 1.22.1. Is this expected, and, more importantly, guaranteed?

Revision history for this message
Curtis Hovey (sinzui) wrote :

--upload-tools is a developer hack that has proven to be very dangerous in production environments. While to gives repeatability during bootstrap, it does so by increasing the unknowns for future upgrades.

1. the agent deployed is not in streams, it is treated as development version without an upgrade path.
2. Users must use --upload-tool when upgrading, but that is based on the client.
   We have seen that enterprises are running more than one version of the client, and some are
   development versions, which make unpredictable choices for an upgrade to a production env.
3. --upload-tools disabled streams
   Users need to configure streams to get back to "just works"
       juju set-env agent-metadata-url=https://streams.canonical.com/juju/tools

By setting agent-version in the config, users can guarantee repeatability in bootstraps and upgrades.

Since juju is designed to upgrade to fix importance issues in clouds and in the state-server, anything that automates it needs to choose to state the version required, or wait for upgrades to complete.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Sorry, incomplete sentence...

Since juju is designed to upgrade to fix importance issues in clouds and in the state-server, anything that automates it needs to choose to state the version required, or wait for upgrades to complete on the state-server *before* choosing to deploy.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Yes, that is expected and guaranteed. When you specify --upload-tools, it will override the environment's agent-version and only use the uploaded local tools. It will not go look for newer versions.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

If this environment is still set up, can you attach the machine-N.log files from each host?

Revision history for this message
Dean Henrichsmeyer (dean) wrote :

Cheryl, we no longer have this environment up. We'll work on reproducing and let you know.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.