migration_interface breaks cold migrations
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla-ansible |
Fix Released
|
Undecided
|
Gaël THEROND |
Bug Description
**Bug Report**
What happened:
configuring a migration_interface swaps nova_ssh to listen on the new interface, but cold migrations (resize, etc) will still use the api interface to move xml, which breaks cold migrations.
live migrations will use the right interface, and work as intended still, though, so this issue was hard to notice at first, and got to our production.
What you expected to happen:
as nova has no option for this it seems, nova_ssh should listen on both interfaces when an alternative migration_interface is configured.
This is a simplified fix of what I saw when checking TripleO for the same issue (as it happened there too)
How to reproduce it (minimal and precise):
configure migration_interface to a different network
openstack server resize
nova will log a connection refused on <api_interface_
**Environment**:
* Kolla-Ansible version: stable/ussuri (looks to affect all versions)
Changed in kolla-ansible: | |
assignee: | nobody → Gaël THEROND (fl1nt) |
status: | New → Confirmed |
Changed in kolla-ansible: | |
status: | In Progress → Fix Released |
Hi Alexander, just to give you more insight on this one.
This is due to nova-compute (manager.py) calling for this function on cold migration: /opendev. org/openstack/ nova/src/ commit/ f5f7c2540150c7e e7640c834d5caec 31b3f5a7ab/ nova/utils. py#L109
https:/
Because you probably don't get any DNS resolution within your underlying infrastructure, nova is actually using the /etc/hosts file to resolve your host node name.
Which in turn wrongly redirect it using your internal api subnet as the /etc/hosts file being propulated in here using the api_interface value: /opendev. org/openstack/ kolla-ansible/ src/commit/ e744b9d510ba183 d5a80b3e467d0e7 64eb5c9e02/ ansible/ roles/baremetal /tasks/ pre-install. yml#L25
https:/
I'm having the same issue, however, before starting to fix it, I'll need to do some tests and have a few discussion with the team in order to validate that nova is the only service relying on these IPs.
For instance, changing the task to use migration_interface variable instead of the current api_interface MIGHT have an impact on RabbitMQ.