OpenStack Compute (nova)

Migration does not take account of Neutron routed pods

Bug #1967314 reported by Andrew Bonney on 2022-03-31

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Invalid	Undecided	Unassigned

Bug Description

Description
===========
Nova does not appear to take account of Neutron-defined host aggregates when migrating instances which have interfaces attached to routed provider networks. If the operator does not manually select an appropriate hypervisor to migrate to (within the same pod), the automated selection is likely to choose a host which does not have connectivity to the required VLAN.

I'm not certain to what extent this issue is on the Nova side versus Neutron, but a solution to avoid accidental migration to inappropriate hosts in the simplest case would be appreciated.

Steps to reproduce
==================
* Deploy a system which makes use of Neutron routed provider networks, with a single logical network using a separate VLAN/segment per rack (https://docs.openstack.org/neutron/latest/admin/config-routed-networks.html)
* Launch a VM and attach a port to the instance which lives within the above network. It will be allocated an appropriate IP address based upon the segment it has landed in.
* Perform a live migration. Assuming a hypervisor outside of the current segment is selected then connectivity to the network will be lost.

Expected result
===============
Either:
* The live migration would have proceeded, but only to a host within the same rack/segment.
* The live migration would fail as the only available hosts live outside of the current rack/segment.
* The IP address associated with the attached port is changed to reflect the requirements of the rack/segment which the instance has been migrated to.

Actual result
=============
* Where a migration results in the instance moving to a host outside the original rack/segment, connectivity is lost.

Environment
===========
OpenStack Xena release
Libvirt+KVM
Nova f766db261634c8f95f874ba132159f148de9e8bf
Neutron with Linux bridge networking e6953e217c731559724971a371e81d2b6f9837e0

Thanks!

Tags:

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2022-04-22:

In wallaby we added support for scheduling for routed networks[1]. But there is a config option to enable it. Do you have the following config enabled in your nova-scheduler services?

[scheduler]
query_placement_for_routed_network_aggregates = True

I set this to Incomplete. Please set it back to New if the issue is still present when the above config is set.

[1]https://docs.openstack.org/releasenotes/nova/wallaby.html#relnotes-23-0-0-stable-wallaby-new-features
[2] https://docs.openstack.org/nova/latest/configuration/config.html#scheduler.query_placement_for_routed_network_aggregates

tags:	added: scheduler
tags:	added: config
Changed in nova:
status:	New → Incomplete

Revision history for this message

Andrew Bonney (andrewbonney) wrote on 2022-04-25:

Thanks for the pointer. Setting this does indeed prevent automated scheduling to the wrong hosts during live migration.

I note that if a live migration destination host is chosen explicitly it is still possible to migrate to an invalid host. Am I correct in assuming this is by design? Please feel free to mark this bug as complete/invalid if that's the case.

Changed in nova:
status:	Incomplete → New

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2022-04-25 (last edit on 2022-04-25):

It depends on the exact API call that is used to trigger the live migration. See [1] for the full API ref. But in short:

a) if ``host`` is provided in the API call with API microversion < 2.30 then scheduling will be skipped so the VM can be moved to an invalid host

b) if the ``host`` is provided in the API call with API microversion > 2.30 but < 2.67 and the ``force`` flag is set to True in the API request then the scheduling will be skipped so the VM can be moved to an invalid host

c) with API microversion >=2.30 < 2.67 and ``force`` False the scheduler should prevent the move to an invalid host

d) with API microversion >= 2.67 the ``force`` flag has been removed so the scheduler should always prevent the move to an invalid host.

I set this bug to Invalid as I assume that it was only a configuration issue. But feel free to set it back to New if you see scheduling problems in case of c) or d).

[1] https://docs.openstack.org/api-ref/compute/?expanded=live-migrate-server-os-migratelive-action-detail#live-migrate-server-os-migratelive-action