Network connection is interrupted for a while after ovs-agent restart

Bug #1569795 reported by patrick on 2016-04-13
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
neutron
High
Unassigned

Bug Description

Problem:

After restarting neutron-openvswitch-agent, the tentant network connection would be interrupted for a while.

The cause is that the flow is not added immediately in default table of br-tun to resubmit to the right tunneling table before the flow is removed by cleanup_stale_flows for its stale cookie.

The network connection would be recovered after fdb_add is finished in neutron-openvswitch-agent.

Affected Neutron version:
Liberty

Possible Solution:

Remove cleanup_stale_flows() since the stale flows would finally be cleaned out by ovs-agent.

Thanks.

Kevin Benton (kevinbenton) wrote :

If I understand the issue, you are saying that l2pop flows are being cleaned up too soon?

tags: added: ovs
tags: added: l2-pop
patrick (kldeng05) wrote :

Hi, @Kevin

l2pop flows(table=20) works fine. The issue is caused by the default table(table = 0).

Since the stale flows in table 0 are being cleaned up before the new flows are installed, the vms can't receive any reply from vms residing in the other hypervisors related to the tunnel port.

For example, the flow below is installed in default table of br-tun.

cookie=0xa6ea370497991ce2, duration=61644.617s, table=0, n_packets=68643, n_bytes=6332921, idle_age=1, priority=1,in_port=34 actions=resubmit(,4)

And it will be cleaned up after the ovs-agent restarts. But the new flow won't be installed util the add_fdb_entries RPC call is received by ovs-agent and the fdb_add_tun procedure is finished. So, there is no flows for in_port=34 in a while and any traffic arriving from tunnel port 34 will be droped!!!!

Thank you!

patrick (kldeng05) on 2016-04-14
Changed in neutron:
assignee: nobody → patrick (kldeng05)

Fix proposed to branch: master
Review: https://review.openstack.org/305724

Changed in neutron:
status: New → In Progress
patrick (kldeng05) wrote :

UPDATE:

Solution(RFC):
Substituting rpc.cast with rpc.call in _notification_host to make
sure that the flow won't be removed by cleanup_stale_flows before
an fresh flow is installed.

Thanks.

Changed in neutron:
importance: Undecided → High
SHI Peiqi (uestc-shi) on 2016-04-16
Changed in neutron:
assignee: patrick (kldeng05) → SHI Peiqi (uestc-shi)
SHI Peiqi (uestc-shi) on 2016-04-18
Changed in neutron:
status: In Progress → Invalid
status: Invalid → In Progress
SHI Peiqi (uestc-shi) on 2016-04-19
Changed in neutron:
assignee: SHI Peiqi (uestc-shi) → nobody
Changed in neutron:
status: In Progress → Confirmed
Tony Tan (tonytan4ever) on 2016-07-06
Changed in neutron:
assignee: nobody → Tony Tan (tonytan4ever)
Changed in neutron:
assignee: Tony Tan (tonytan4ever) → nobody
Changed in neutron:
assignee: nobody → Swetha G (swethamohan318)
Changed in neutron:
assignee: Swetha G (swethamohan318) → nobody

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/305724
Reason: This review is > 3 months without comment and currently blocked by a core reviewer with a -1. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -1 on this review to ensure you address their concerns.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers