Error live migration vm with disabled port

Bug #1951623 reported by Alexander Shishebarov
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Low
Unassigned

Bug Description

We use neutron(stable/stein) ml2/ovs plugin and nova (stable/stein)
An error occurs in case of server live migration with at least one administratively disabled port(admin_state_up DWON).
live_migration_wait_for_vif_plug is enabled by default.
Here ports configuration of VM.

+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------+--------+
| ID | Name | MAC Address | Fixed IP Addresses | Status |
+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------+--------+
| e09dca39-f62f-4a3b-a0f6-4d98edcd037e | | fa:16:3e:83:73:13 | ip_address='192.168.0.3', subnet_id='0bb936ed-c4a4-4a5d-be18-2794a73aea79' | DOWN |
| e331b3d3-cf59-49a0-b531-590433523f6f | | fa:16:3e:53:b4:3c | ip_address=’10.10.10.1, subnet_id='a6786536-b67b-40a4-9470-e3b158a71dbc' | ACTIVE |
+--------------------------------------+------+-------------------+-----------------------------------------------------------------------------+--------+

When we try migrate what VM
openstack server migrate b4743fab-17e0-48af-8ad3-3b81fd05a968 --live cmp1
An error occurs in the pre live migration process.
The error occurs on the server from which the migration is performed.
"
2021-11-18 17:34:28,910.910 2173136 WARNING nova.compute.manager [-] [instance: b4743fab-17e0-48af-8ad3-3b81fd05a968] Timed out waiting for events: [('network-vif-plugged', u'e09dca39-f62f-4a3b-a0f6-4d98edcd037e'), ('network-vif-plugged', u'e331b3d3-cf59-49a0-b531-590433523f6f')]. If these timeouts are a persistent issue it could mean the networking backend on host cmp1 does not support sending these events unless there are port binding host changes which does not happen at this point in the live migration process. You may need to disable the live_migration_wait_for_vif_plug option on host cmp1.: Timeout: 300 seconds
"

This happens because nova-compute is waiting for the network-vif-plugged event for each port of migrating VM, regardless of its initial state(up or down).
https://github.com/openstack/nova/blob/stable/stein/nova/compute/manager.py#L6767

But the neutron server does not send the rpc message with "network-vif-plugged" if the port of migrating VM is disabled.
https://github.com/openstack/neutron/blob/stable/stein/neutron/notifiers/nova.py#L207
"
2021-11-18 16:09:03,642.642 2916237 DEBUG neutron.notifiers.nova [req-98b397eb-db00-4370-9510-b071c501a12e b6ba9c75146a49829a7427a3e8cc3c10 192796e61c174f718d6147b129f3f2ff - default default] Ignoring state change previous_port_status: DOWN current_port_status: DOWN port_id e09dca39-f62f-4a3b-a0f6-4d98edcd037e record_port_status_changed /usr/lib/python2.7/dist-packages/neutron/notifiers/nova.py
"
As a result, the migration process is stopped.

Tags: neutron
description: updated
description: updated
description: updated
description: updated
Revision history for this message
Michal Arbet (michalarbet) wrote :

I filled bug in nova LP also related to vip-plugged-in here -> https://bugs.launchpad.net/nova/+bug/1951720

Do you think it can have some connection with this behaviour reported in this bugeport ?

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

No, re: #1, I think bug 1951720 isn't related to this one.

The problem is that Nova doesn't verify whether the port is disabled before waiting for the event.
We already verify the port for other methods, but not this one, so it just looks a confirmed bug report.

Changed in nova:
status: New → Confirmed
importance: Undecided → Low
tags: added: neutron
Revision history for this message
sean mooney (sean-k-mooney) wrote :

nova sotres the prot status in the active filed of the vif object
https://github.com/openstack/nova/blob/master/nova/network/neutron.py#L3182-L3185

and during first boot/spwan code path we check this
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7221-L7228

however the live migration events do not
https://github.com/openstack/nova/blob/master/nova/network/model.py#L488-L508
https://github.com/openstack/nova/blob/master/nova/network/model.py#L567-L586

so we just need to check if the vif is active and skip them if not

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/819494

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Alexander Shishebarov (ashishebarov) wrote :

RE #3.
Added a check that the port is active state and it did not work.
Nova stores the state of the port in the cache as "active" even if the port is disabled in the neutron.
https://github.com/openstack/nova/blob/master/nova/network/neutron.py#L3183-L3185

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/827314

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.