concurrent calls to _bind_port_if_needed results in port stuck in DOWN status
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Undecided
|
Kailun Qin |
Bug Description
There is a concurrency issue in _bind_port_
For example, if get_device_details [4] runs concurrently with update_port it will call get_bound_
A port stuck in the DOWN status has negative effects on consumers of the L2Population functionality because the L2Population mechanism driver will not be triggered to publish that a port is UP on a given compute node.
The issue coincides with the occurrence of this log:
2018-03-14 11:16:00.429 19987 INFO neutron.
On the first iteration thru _bind_port_
This was discovered by our product test group using a simple setup of 2 compute nodes and a single VM that was being live-migrated between the two nodes. The VM was configured with 3 ports. Over ~1000 live migrations this happened between 5 and 10 times and each time caused loss of communication to the VM instance as the agents were not given the latest L2Population data because the port appeared DOWN in the database. Manual intervention was required to set the port admin_state_
This was observed in stable/pike but looking at the code in master I don't see that it would behave any differently.
[1] plugins.
[2] plugins.
[3] plugins.
[4] plugins.
[5] plugins.
[6] plugins.
[7] plugins.
tags: | added: api l2-pop needs-attention |
Changed in neutron: | |
assignee: | nobody → Rajat Dhasmana (whoami-rajat) |
Changed in neutron: | |
assignee: | nobody → Kailun Qin (kailun.qin) |
Changed in neutron: | |
assignee: | nobody → Kailun Qin (kailun.qin) |
status: | New → In Progress |
tags: | added: neutron-proactive-backport-potential |
tags: | added: neutron-easy-proactive-backport-potential |
@Rajat
Hi Rajat, may I know whether you are still working on this issue? If not, I'd like to take over and make a fix for it. :) Let me know if any question or concern. Thanks!