upgrading 2.8.10 to 2.9.0 does a 'double upgrade'

Bug #1927793 reported by John A Meinel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Invalid
Medium
Unassigned

Bug Description

I was testing upgrading from 2.8.10 to 2.9.0 in HA with a lot of units. I saw this in the debug logs:

2021-05-07 20:48:30 INFO juju.worker.upgrader error.go:33 upgraded from 2.8.10-bionic-amd64 to 2.9.0-bionic-amd64 ("https://10.5.24.177:17070/model/aef0a36b-3371-4b1c-8ad7
ERROR must restart: an agent upgrade is available
2021-05-07 20:48:31 INFO juju.cmd supercommand.go:56 running jujud [2.9.0 0 ac860f7db4296273ea2cf213115ec2c229d57a07 gc go1.14.15]

and then later

2021-05-07 20:49:58 INFO juju.cmd.jujud errors.go:45 upgraded from 2.9.0-bionic-amd64 to 2.9.0-ubuntu-amd64 ("https://10.5.24.177:17070/model/aef0a36b-3371-4b1c-8ad7-23aca
ERROR must restart: an agent upgrade is available
2021-05-07 20:49:58 INFO juju.cmd supercommand.go:56 running jujud [2.9.0 0 ac860f7db4296273ea2cf213115ec2c229d57a07 gc go1.14.15]

(note that both are running the exact same hash).

This is because 2.8 thinks in terms of series, and tells the agent to upgrade to a particular series, while 2.9 thinks in terms of OS and so wants the agent to match a known os.
But doing down time twice during the upgrade path is probably not ideal.

Revision history for this message
John A Meinel (jameinel) wrote :

Note that during this time it was bouncing unit agents:
2021-05-07 20:49:41 INFO juju.apiserver.connection request_notifier.go:96 agent login: unit-ul18-1 for 7c9cdc76-95a7-4eae-80e0-6051207a6fef
2021-05-07 20:49:41 INFO juju.apiserver.connection request_notifier.go:125 agent disconnected: unit-ul18-1 for 7c9cdc76-95a7-4eae-80e0-6051207a6fef
2021-05-07 20:49:42 INFO juju.apiserver.connection request_notifier.go:96 agent login: unit-ul19-1 for 7c9cdc76-95a7-4eae-80e0-6051207a6fef
2021-05-07 20:49:42 INFO juju.apiserver.connection request_notifier.go:125 agent disconnected: unit-ul19-1 for 7c9cdc76-95a7-4eae-80e0-6051207a6fef
2021-05-07 20:49:42 INFO juju.apiserver.connection request_notifier.go:96 agent login: unit-ul51-0 for 7c9cdc76-95a7-4eae-80e0-6051207a6fef
2021-05-07 20:49:42 INFO juju.apiserver.connection request_notifier.go:125 agent disconnected: unit-ul51-0 for 7c9cdc76-95a7-4eae-80e0-6051207a6fef

It may be that we were just denying them because we were in the process of upgrading.

Revision history for this message
Joseph Phillips (manadart) wrote :

Is this a model upgrade, or just a controller upgrade?

I upgraded a HA LXD controller (only a few units) and only one upgrade was preformed, but as we observed in production there is a failure before the coordination sorts itself out and upgrades successfully.

machine-2: 11:22:37 INFO juju.worker.upgradesteps checking that upgrade can proceed
machine-2: 11:22:39 INFO juju.worker.upgradesteps signalling that this controller is ready for upgrade
machine-1: 11:22:45 WARNING juju.worker.upgradesteps stopped waiting for other controllers: tomb: dying
machine-1: 11:22:45 ERROR juju.worker.upgradesteps upgrade from 2.8.11.1 to 2.9.0 for "machine-1" failed (giving up): tomb: dying
machine-2: 11:22:50 INFO juju.worker.upgradesteps waiting for other controllers to be ready for upgrade
machine-2: 11:22:50 WARNING juju.worker.upgradesteps stopped waiting for other controllers: tomb: dying
machine-2: 11:22:50 ERROR juju.worker.upgradesteps upgrade from 2.8.11.1 to 2.9.0 for "machine-2" failed (giving up): tomb: dying

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1927793] Re: upgrading 2.8.10 to 2.9.0 does a 'double upgrade'

This might have been upgrading the model, it was just something I noticed
in passing, not something I was focusing on.

On Mon, May 10, 2021 at 5:55 AM Joseph Phillips <email address hidden>
wrote:

> Is this a model upgrade, or just a controller upgrade?
>
> I upgraded a HA LXD controller (only a few units) and only one upgrade
> was preformed, but as we observed in production there is a failure
> before the coordination sorts itself out and upgrades successfully.
>
> machine-2: 11:22:37 INFO juju.worker.upgradesteps checking that upgrade
> can proceed
> machine-2: 11:22:39 INFO juju.worker.upgradesteps signalling that this
> controller is ready for upgrade
> machine-1: 11:22:45 WARNING juju.worker.upgradesteps stopped waiting for
> other controllers: tomb: dying
> machine-1: 11:22:45 ERROR juju.worker.upgradesteps upgrade from 2.8.11.1
> to 2.9.0 for "machine-1" failed (giving up): tomb: dying
> machine-2: 11:22:50 INFO juju.worker.upgradesteps waiting for other
> controllers to be ready for upgrade
> machine-2: 11:22:50 WARNING juju.worker.upgradesteps stopped waiting for
> other controllers: tomb: dying
> machine-2: 11:22:50 ERROR juju.worker.upgradesteps upgrade from 2.8.11.1
> to 2.9.0 for "machine-2" failed (giving up): tomb: dying
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1927793
>
> Title:
> upgrading 2.8.10 to 2.9.0 does a 'double upgrade'
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1927793/+subscriptions
>

Revision history for this message
Ian Booth (wallyworld) wrote :

This is due to the cutover to os based agent binaries. It only happens the first time a 2.8 *controller* agent is upgraded to 2.9. The 2.8 controller agent is what initially writes the new agent symlinks, but 2.8 doesn't know about the os based agents, so it uses the series ones. To get everything in sync and using the new agents for *all* agents, a check in 2.9 controller is done to see if the older series agent is being used and if so, the newer os agent is set up and one more upgrade restart is done.

This can be seen from the message

2021-05-07 20:49:58 INFO juju.cmd.jujud errors.go:45 upgraded from 2.9.0-bionic-amd64 to 2.9.0-ubuntu-amd64 ("https://10.5.24.177:17070/

Notice:
bionic -> ubuntu

Changed in juju:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.