with l2pop sometimes agents fail to create flood flows with multiple workers

Bug #1555600 reported by venkata anil
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Expired
Medium
Unassigned

Bug Description

When multiple api and rpc workers enabled for neutron-server, sometimes ovs agents fail to create flood flows to other ovs agents. This is frequently reproducible when multiple api and rpc workers enabled during migrations and also during evacuations of instances from one node to other. Some times tunnel ports are also not created.

In these scenarios, l2pop driver is not notifying agent to create tunnel ports and flood flows, hence agent is unable to create flood flows to other agents.

Revision history for this message
venkata anil (anil-venkata) wrote :

Currently l2pop driver in server does two tasks
1) notify port info to other remote agents. When remote agents get this notification, they add unicast address flow
2) identify when first/last port(on a network) is created on a agent and
     a) notify other remote agents to add flood flow to this agent.
     b) notify current host about all ports(in that network), so that the current host can create tunnels and flows to remote agents.
Currently agent receives these notifications from server, and creates flows and ports.

When multiple api and rpc workers enabled for neutron-server, current implementation has a problem.
neutron-server can't perform second task(i.e identify when first/last port(on a network) is created on a agent) properly.
For example, If we have below scenario -
setup: we have server(with multiple api and rpc workers), compute1 and compute2 nodes.
Two ports are created on compute2 node., in the following sequence,
Server's worker1 creates port1(first port) in DB and before this worker's l2pop driver code executed, worker2 creates a port2(second port) in DB. In this scenario, worker1's l2pop driver checks for compute2's ports in DB, it gets 2 ports and ignores notifying to compute1 about creating FLOOD_FLOW to compute2. Because of this compute1 will never have FLOOD_FLOW to compute1. Similarly FLOOD FOW deletion notification is also not sent. And also compute2 won't get compute1's port info, hence compute1 can't create flood flows to compute 2.

As this task(identifying first/last port on agent on a network) can't be done in server, this implementation has to go to l2 agent. L2 agents should able to identify when first/last port(on network) on other agent is created/deleted and accordingly create and delete flood flows(and remove tunnel ports also).

Changed in neutron:
assignee: nobody → venkata anil (anil-venkata)
Revision history for this message
venkata anil (anil-venkata) wrote :

Previously the description in comment#1 was in https://bugs.launchpad.net/neutron/+bug/1535392 .

Bug #1535392 is updated to focus only on "handling port status changes in l2pop driver" and
this bug(#1555600) will handle "agents fail to create flood flows with multiple workers"

Changed in neutron:
status: New → In Progress
tags: added: l2-pop
Changed in neutron:
importance: Undecided → Medium
tags: added: kilo-backport-potential liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/291674

Revision history for this message
venkata anil (anil-venkata) wrote :

Another change https://review.openstack.org/#/c/288284/ with different approach is also proposed.
In this change l2pop driver always notify flood entries to agents along with every port notification.
Notification to all hosts will have -
  1) port fdb
  2) flood entry to port's agent

Notification to port's agent -
Always notify port's hosting agent(if not first port on the agent), flood entries of all agents hosting that network(no ports ip and macs, only flood entries).
So the hosting agent will always get flood entries of other agents hosting the network.
For the first port on the agent, it will get other ports FDB(mac and ip) also like before.
If it is not first port, it will only get flood entries to other agents.

With this change, at least we can avoid missing flood entries and tunnels.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/291674
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Lost steam.

Changed in neutron:
assignee: venkata anil (anil-venkata) → nobody
status: In Progress → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by venkata anil (<email address hidden>) on branch: master
Review: https://review.openstack.org/288284

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.