controller migration is very hard when dealing with large deployments

Bug #1918680 reported by Aymen Frikha
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Joseph Phillips
2.8
Fix Released
High
Joseph Phillips

Bug Description

Hello trying to do the controller migration from xenial controller to bionic one with a big openstack environment.
The migration keep failing because there is some units are in executing mode.

Is it possible to add a feature for juju to prevent all the agents from executing to be able to do the migration smoothly ?

We are using juju 2.8.9

Revision history for this message
Joseph Phillips (manadart) wrote :

There is something odd happening here. We should be allowing migration to proceed if the agent status is "idle" or "executing".

It is others that we stop for: allocating/rebooting/failed/lost.

If we encounter this again, I would very much like to see what the status actually is.

Changed in juju:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Joseph Phillips (manadart) wrote :

See here:
https://pastebin.ubuntu.com/p/YPqrVzrgwB/

When the migration-inactive-flag drops, a graceful shut-down should ensue.

Instead, we are throwing an error, changing the status to "failed" as seen here:
https://pastebin.ubuntu.com/p/d7XSfmQycN/

This means the QUIESC stage fails and the migration is aborted.

Fixing this will mean that we don't have to keep retrying migrations until we get lucky and pass the first validation step.

Changed in juju:
assignee: nobody → Joseph Phillips (manadart)
milestone: none → 2.8.11
Revision history for this message
Joseph Phillips (manadart) wrote :

The juju/mutex package has an error type defined, ErrCancelled.

We just need to handle this specifically instead of throwing out with it.

Changed in juju:
status: Triaged → In Progress
John A Meinel (jameinel)
Changed in juju:
milestone: 2.8.11 → 2.9-next
Revision history for this message
Joseph Phillips (manadart) wrote :
Changed in juju:
milestone: 2.9-next → 2.9.6
Revision history for this message
Joseph Phillips (manadart) wrote :
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
milestone: 2.9.6 → 2.9.5
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.