Agents fail to upgrade to 2.2.7, unable to 'juju run'

Bug #1738728 reported by Paul Gear
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Tim Penhey
2.2
Fix Released
High
Tim Penhey
2.3
Fix Released
Critical
Tim Penhey

Bug Description

After an apparently-successful upgrade of a controller from 2.2.6 to 2.2.7 (https://pastebin.canonical.com/205786/) and an upgrade request for all hosted models, all machine & unit agents in the hosted model fail to upgrade to 2.2.7, and 'juju run' is no longer functional, reporting (after around 5 minutes):

ERROR timed out waiting for results from: machine 0, machine 1, machine 2, machine 3, machine 4, machine 5, machine 6

Revision history for this message
Paul Gear (paulgear) wrote :

Diagnostics:

- juju-goroutines from controller machine agent: https://pastebin.canonical.com/205795/

- juju-engine-report from controller machine agent: https://pastebin.canonical.com/205796/

- Complete logs from controller: https://private-fileshare.canonical.com/~paulgear/lp1738728/

- juju-goroutines from hosted model machine agent: https://pastebin.canonical.com/205790/

- juju-engine-report from hosted model machine agent: https://pastebin.canonical.com/205791/

- Complete logs from hosted model machine agent: https://pastebin.canonical.com/205794/

Changed in juju:
assignee: nobody → Tim Penhey (thumper)
Revision history for this message
William Grant (wgrant) wrote :

I experimented a little on my affected controller. Two rounds of full model agent restarts got things upgraded (the first round upgraded the machine agents, and then the second round upgraded the unit agents, despite all being restarted each time). But the agents still wouldn't react to anything, even after a further controller restart.

I investigated the odd behaviour further with "juju run". An action on any agent on the controller normally times out, but can be made to succeed if the target agent is restarted while the action is pending. It seems that the 2.2.7 controller is always able to deliver up to date state to its agents on their startup, but it can't deliver events.

Joel Sing (jsing)
Changed in juju:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
William Grant (wgrant) wrote :

We downgraded my model and controller to 2.2.6. The model is easy, the controller is a bit harder:

For the model:

 - In a mongo shell on the controller:
   db.settings.update({"_id": "$MODEL_UUID:e"}, {$set: {"settings.agent-version": "2.2.6"}})

 - Restart all model machine agents.

 - Wait a few seconds and restart all model unit agents.

 - "juju status --format yaml | grep version" should now report 2.2.6 across the board.

For the controller:

 - In a mongo shell on the controller:
   db.settings.update({"_id": "$CONTROLLER_MODEL_UUID:e"}, {$set: {"settings.agent-version": "2.2.6"}})

 - In each controller machine's agent.conf, change upgradedToVersion to 2.2.6.

 - Restart all controller machine agents.

 - Restart any controller unit agents.

And you should be back in business.

It may also be desirable to purge the 2.2.7 upgradeInfo, but that wasn't necessary for the downgrade.

Tim Penhey (thumper)
Changed in juju:
status: Confirmed → In Progress
milestone: none → 2.2.8
Revision history for this message
Tim Penhey (thumper) wrote :
Revision history for this message
Tim Penhey (thumper) wrote :
Changed in juju:
milestone: 2.2.8 → 2.4-beta1
importance: Critical → High
Revision history for this message
Anastasia (anastasia-macmood) wrote :

I am sure that the above PRs have been merged into develop (2.4+) as part of a bigger development merge. Hence, I am marking this report as Fix Committed.

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.