with l2pop sometimes agents fail to create flood flows with multiple workers
Bug #1555600 reported by
venkata anil
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Expired
|
Medium
|
Unassigned |
Bug Description
When multiple api and rpc workers enabled for neutron-server, sometimes ovs agents fail to create flood flows to other ovs agents. This is frequently reproducible when multiple api and rpc workers enabled during migrations and also during evacuations of instances from one node to other. Some times tunnel ports are also not created.
In these scenarios, l2pop driver is not notifying agent to create tunnel ports and flood flows, hence agent is unable to create flood flows to other agents.
tags: | added: l2-pop |
Changed in neutron: | |
importance: | Undecided → Medium |
tags: | added: kilo-backport-potential liberty-backport-potential |
To post a comment you must log in.
Currently l2pop driver in server does two tasks
1) notify port info to other remote agents. When remote agents get this notification, they add unicast address flow
2) identify when first/last port(on a network) is created on a agent and
a) notify other remote agents to add flood flow to this agent.
b) notify current host about all ports(in that network), so that the current host can create tunnels and flows to remote agents.
Currently agent receives these notifications from server, and creates flows and ports.
When multiple api and rpc workers enabled for neutron-server, current implementation has a problem.
neutron-server can't perform second task(i.e identify when first/last port(on a network) is created on a agent) properly.
For example, If we have below scenario -
setup: we have server(with multiple api and rpc workers), compute1 and compute2 nodes.
Two ports are created on compute2 node., in the following sequence,
Server's worker1 creates port1(first port) in DB and before this worker's l2pop driver code executed, worker2 creates a port2(second port) in DB. In this scenario, worker1's l2pop driver checks for compute2's ports in DB, it gets 2 ports and ignores notifying to compute1 about creating FLOOD_FLOW to compute2. Because of this compute1 will never have FLOOD_FLOW to compute1. Similarly FLOOD FOW deletion notification is also not sent. And also compute2 won't get compute1's port info, hence compute1 can't create flood flows to compute 2.
As this task(identifying first/last port on agent on a network) can't be done in server, this implementation has to go to l2 agent. L2 agents should able to identify when first/last port(on network) on other agent is created/deleted and accordingly create and delete flood flows(and remove tunnel ports also).