in devstack, "nova migrate <uuid>" will try to migrate to the same host (and then fail)

Bug #1819216 reported by Chris Friesen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Triaged
Medium
Unassigned

Bug Description

In multinode devstack I had an instance running on one node and tried running "nova migrate <uuid>". The operation started, but then the instance went into an error state with the following fault:

{"message": "Unable to migrate instance (2bbdab8e-3a83-43a4-8c47-ce57b653e43e) to current host (fedora-1.novalocal).", "code": 400, "created": "2019-03-08T19:59:09Z"}

Logically, I think that even if "resize to same host" is enabled, for a "migrate" operation we should remove the current host from consideration. We know it's going to fail, and it doesn't make sense anyways.

Also, it would probably make sense to make "migrate" work like "live migration" which removes the current host from consideration.

Tags: compute
Chris Friesen (cbf123)
summary: - in devstack, "nova migrate <uuid>" can try to migrate to the same host
+ in devstack, "nova migrate <uuid>" will try to migrate to the same host
+ (and then fail)
description: updated
Revision history for this message
Matt Riedemann (mriedem) wrote :

I was going to duplicate this against bug 1811235 since it's definitely related but sort of a different issue. In this case you have allow_resize_to_same_host=True (default in devstack but not nova) and two nodes. The scheduler picked the host that the instance is on for whatever reason, and then if you're not using the vcenter driver (you're using libvirt by default), you hit this in the compute and it blows up:

https://github.com/openstack/nova/blob/d3254af0fe2b15caff3990c965194133625b681d/nova/compute/manager.py#L4287

So we definitely have some weirdness around this because the control plane services don't know about the compute configuration but the API is allowing resize (and cold migrate) to the same host based on configuration there.

Options:

1. Always ignore the source host during cold migrate, similar to how live migrate and evacuate work, but that would break the vcenter case for cold migrating to the same compute service host which is just managing a vcenter cluster of esxi hosts.

2. Somehow communicate to the scheduler that we're doing a cold migration and we can or can't pick the source host. Now that we report driver capabilities as traits to placement, we could potentially rely on that to pass something along to the scheduler and placement about this case. That's probably the more flexible way to fix this.

Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Matt Riedemann (mriedem) wrote :

The change I'm referencing in comment 1 item 2 is this: https://review.openstack.org/#/c/538498/

Revision history for this message
Matt Riedemann (mriedem) wrote :

Note that this is also extremely latent behavior, not a regression as far as I know.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Here is an os-traits patch to add a compute capability trait for a potential fix mentioned in option 2 in comment 1:

https://review.opendev.org/#/c/666604/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.