OpenStack Compute (nova)

in devstack, "nova migrate <uuid>" will try to migrate to the same host (and then fail)

Bug #1819216 reported by Chris Friesen on 2019-03-08

This bug report is a duplicate of: Bug #1748697: Cold migration fails when the filter only returns the host where the vm is located and the vm status is set to ERROR. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Triaged	Medium	Unassigned

Bug Description

In multinode devstack I had an instance running on one node and tried running "nova migrate <uuid>". The operation started, but then the instance went into an error state with the following fault:

{"message": "Unable to migrate instance (2bbdab8e-3a83-43a4-8c47-ce57b653e43e) to current host (fedora-1.novalocal).", "code": 400, "created": "2019-03-08T19:59:09Z"}

Logically, I think that even if "resize to same host" is enabled, for a "migrate" operation we should remove the current host from consideration. We know it's going to fail, and it doesn't make sense anyways.

Also, it would probably make sense to make "migrate" work like "live migration" which removes the current host from consideration.

See original description

Tags:

Chris Friesen (cbf123) on 2019-03-08

summary:	- in devstack, "nova migrate <uuid>" can try to migrate to the same host + in devstack, "nova migrate <uuid>" will try to migrate to the same host + (and then fail)
description:	updated

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-03-08:

I was going to duplicate this against bug 1811235 since it's definitely related but sort of a different issue. In this case you have allow_resize_to_same_host=True (default in devstack but not nova) and two nodes. The scheduler picked the host that the instance is on for whatever reason, and then if you're not using the vcenter driver (you're using libvirt by default), you hit this in the compute and it blows up:

https://github.com/openstack/nova/blob/d3254af0fe2b15caff3990c965194133625b681d/nova/compute/manager.py#L4287

So we definitely have some weirdness around this because the control plane services don't know about the compute configuration but the API is allowing resize (and cold migrate) to the same host based on configuration there.

Options:

1. Always ignore the source host during cold migrate, similar to how live migrate and evacuate work, but that would break the vcenter case for cold migrating to the same compute service host which is just managing a vcenter cluster of esxi hosts.

2. Somehow communicate to the scheduler that we're doing a cold migration and we can or can't pick the source host. Now that we report driver capabilities as traits to placement, we could potentially rely on that to pass something along to the scheduler and placement about this case. That's probably the more flexible way to fix this.

Changed in nova:
status:	New → Triaged
importance:	Undecided → Medium

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-03-08:

The change I'm referencing in comment 1 item 2 is this: https://review.openstack.org/#/c/538498/

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-03-08:

Note that this is also extremely latent behavior, not a regression as far as I know.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-06-20:

Here is an os-traits patch to add a compute capability trait for a potential fix mentioned in option 2 in comment 1:

https://review.opendev.org/#/c/666604/

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1748697 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.