L2pop add_fdb_entries concurrency issue

Bug #1611308 reported by Hong Hui Xiao
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Expired
High
Unassigned

Bug Description

This is observed during live migration in a large scale env. ovs-agent+l2pop is used in the env.

The oberserved issue is:
If multiple vms live migrates at the same time, some host will have stale unicast information at table 20, which still points vm to the old host.

After checking the code, there is a potential issue for [1], when concurrent call to it.

Assuming there is 3 hosts, A, B, C. The VMs are being migrate from A to B and C. The VMs are in the same neutron network. and host B don't have any port of that neutron network before the migration.

The scenario might be:
1) VM1 migrates from host A to host B.
2) When the port of VM1 is up in host B, neutron server will be informed, and all the fdb_entries of that neutron network will be generated and sent to host B. The code at [2] will be hit. Let's assume the neutron network has lots of ports in it. So, the call at [2] is expected to take long time.
3) In the middle of 2), another VM, called VM 2 migrate from host A to host C.
4) Let's assume host C already has ports in the neutron network of VM2. So, the code will not hit [2], and just go to [3]. [3] is a lightweight fanout rpc request. ovs-agent at host B might get this request when still processing 2).
5) 4) finished, but 2) is still ongoing.

At this point, host B will have the new unicast information of VM2. However, the information at 2) contains stale information, which still thinks VM2 is at host A.

6) When 2) finished, the stale information of VM2 might cover the new information of VM2, which lead to the reported issue.

[1] https://github.com/openstack/neutron/blob/fd401fe0a052a7103cb19d7385a1c702de05577f/neutron/plugins/ml2/drivers/l2pop/rpc_manager/l2population_rpc.py#L38
[2] https://github.com/openstack/neutron/blob/fd401fe0a052a7103cb19d7385a1c702de05577f/neutron/plugins/ml2/drivers/l2pop/mech_driver.py#L240
[3] https://github.com/openstack/neutron/blob/fd401fe0a052a7103cb19d7385a1c702de05577f/neutron/plugins/ml2/drivers/l2pop/mech_driver.py#L247

Tags: l2-pop ovs
Hong Hui Xiao (xiaohhui)
Changed in neutron:
assignee: nobody → Hong Hui Xiao (xiaohhui)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/352827

Changed in neutron:
status: New → In Progress
Revision history for this message
Nate Johnston (nate-johnston) wrote :

In order to ascertain the proper severity for this, can you tell me if you have seen this in the wild, and if so how frequently?

tags: added: l2-pop ovs
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/353550

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Hong Hui Xiao (<email address hidden>) on branch: master
Review: https://review.openstack.org/352827
Reason: https://review.openstack.org/#/c/353550/ is a better alternative.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/353550
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in neutron:
status: In Progress → Incomplete
importance: Undecided → High
assignee: Hong Hui Xiao (xiaohhui) → nobody
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.