Fix flows l2 population related on br-tun being cleaned after RabbitMQ cluster has experienced a network partition

Bug #1883071 reported by gao yu
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
High
gao yu

Bug Description

Pre-conditions: RabbitMQ cluster has experienced a network partition, then restart neutron-ovs-agent.

results: In normal, when the neutron-ovs-agent restarts, the method add_fdb_entries will be called to refresh the l2 pop related flows.However,after RabbitMQ cluster has experienced a network partition, the agent can only receive part of rpc to call add_fdb_entries to refresh the l2 pop related flows. Then those l2 pop related flows whose cookie is old will be cleaned. However, these flows are actually useful, and deleting them will affect the tenant traffic.

Our temporary solution is to change method cleanup_flows. The l2 pop related flows mainly include table20, table21, and table 22, the flow with lowest priority in them is resubmitted to table 22, so we only need to ensure flows in table 22 exist. Before cleanup flows in table 22, we dump all flows in it , then compare vlan_num of every flow with LocalVLANMapping to judge this network is still in use, if not, cleanup it. If this network is still in use, the flow related it in table 22 will not be cleaned until agent get rpc to refresh it.

Tags: l2-pop
gao yu (gaoyublack)
Changed in neutron:
assignee: nobody → gao yu (gaoyublack)
Changed in neutron:
importance: Undecided → High
tags: added: l2-pop
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/735525

Changed in neutron:
status: New → In Progress
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I'm not sure if we should try to fix that issue in Neutron. The root cause of this problem, IIUC was rabbitmq issue and restart of neutron-ovs-agent in same time. In that case, I think that You should first fix rabbitmq and ensure that it's running fine before restarting neutron agents.
Keeping stale flows not removed in bridges isn't solution as this may potentially leads us even to some overlapped openflow rules and leak of traffic between various networks.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Bug closed due to lack of activity, please feel free to reopen if needed.

Changed in neutron:
status: In Progress → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Rodolfo Alonso <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/735525

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.