Nova live-migration fails because duplicate records in neutron's ml2_port_bindings table

Bug #1900843 reported by changzhi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Env: Ussuri environment with 1 controller + 2 compute nodes.

Reproduce steps:
1. Create a VM at comp;
2. Live-migrate the VM from comp to comp2;
3. Live-migration success;
4. Live-migrate the VM from the comp2 to comp;
5. Live migration fail with error "No valid host was found."

Detail error msg:
2020-10-21 03:37:59.622 984895 ERROR nova.network.neutron [req-87a70f81-bdda-463d-8669-6a89fb33596e 0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9 c89a4f7c36c248678addd9e07518cce3 - default default] [instance: 2aa8ff96-e04f-4897-af28-de4fb115ed65] Binding failed for port 71c3a3c7-d853-4b26-9975-50b99c449371 and host comp. Error: (409 {"NeutronError": {"type": "PortBindingAlreadyExists", "message": "Binding for port 71c3a3c7-d853-4b26-9975-50b99c449371 on host comp already exists.", "detail": ""}})

I find that there are duplicate records in the neutron's "ml2_port_bindings" table:

MariaDB [neutron]> select * from ml2_port_bindings where port_id="71c3a3c7-d853-4b26-9975-50b99c449371";
+--------------------------------------+-------+----------+-----------+---------------------------+---------------------------------------------------------------------------------------------------------------------------+----------+
| port_id | host | vif_type | vnic_type | profile | vif_details
                                 | status |
+--------------------------------------+-------+----------+-----------+---------------------------+---------------------------------------------------------------------------------------------------------------------------+----------+
| 71c3a3c7-d853-4b26-9975-50b99c449371 | comp | unbound | normal | {"migrating_to": "comp2"} |
                                 | INACTIVE |
| 71c3a3c7-d853-4b26-9975-50b99c449371 | comp2 | ovs | normal | {} | {"connectivity": "l2", "port_filter": true, "ovs_hybrid_plug": false, "datapath_type": "system", "bridge_name": "br-int"} | ACTIVE |
+--------------------------------------+-------+----------+-----------+---------------------------+---------------------------------------------------------------------------------------------------------------------------+----------+

After removing the "INACTIVE" record, I do the step4( live migrate the VM from host2 to host1 ) success.

I find a similar bug at https://bugs.launchpad.net/nova/+bug/1822884. But it seems that it is different from this bug.

changzhi (changzhi1990)
description: updated
changzhi (changzhi1990)
description: updated
tags: added: live-migration neutron
Revision history for this message
sean mooney (sean-k-mooney) wrote :

as part of step 3 when the post live migration task runs it activates the dest port binding which deactivates all other port binding e.g. the one for the souce compute node. and then it later deletes the source port binding.

from what you desciribe it looks like that souce node binding is not being deleted for some reason.,
can you check the nova comptue logs on both the source and dest for any excpetion related to this and provdie them here.

we have a tempest test in the migration job that migrates a vm back and forth between the souce and dest host. so we validate that this workin the ci on every patch that is proposed. so this seam more likely to be an issue in your deployment then a fundemental issue in the code but without logs its hard to tell.

i am setting this to incomplete for now but if you can provide logs of the first live migration so we can confim it correctly cleaned up the source port binding or failed to do so we can take a closer look.

Changed in nova:
status: New → Incomplete
Revision history for this message
changzhi (changzhi1990) wrote :

Sorry, this bug is a mistake.

Revision history for this message
changzhi (changzhi1990) wrote :

Please close this bug. thanks

changzhi (changzhi1990)
tags: added: invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.