With bidirectional FIP traffic, agent restart can cause hold flows/pkt loss

Bug #1588513 reported by Vedamurthy Joshi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
New
High
Naveen N
R3.0
New
High
Naveen N

Bug Description

R3.0.2.0 Build 4x

40.1.1.8 is in VN VN1 with FIP 20.1.1.4 from VN2

from 40.1.1.8, do :
hping3 --udp 20.1.1.3 -i u1000 -s 20000 -p 10000 --keep -n -q

From 20.1.1.3, do:
hping3 --udp 20.1.1.4 -i u1000 -s 10000 -p 20000 --keep -n -q

Bidrectional continuous udp traffic is setup.

Then restart agent on the compute node where nat flows are present.

Sometimes, it can be seen that hold flows are created and pkt loss is seen

Praveen is aware of this( infact, testcase was from him)

root@nodek2:/var/log/contrail# flow -l --match "20.1.1.3:10000"
Flow table(size 644349952, entries 5033984)

Entries: Created 1273 Added 45 Processed 1272 Used Overflow entries 0
(Created Flows/CPU: 2 1 1 2 3 2 2 4 0 2 0 0 0 0 0 0 0 5 6 4 1231 4 0 4 0 0 0 0 0 0 0 0)(oflows 0)

Action:F=Forward, D=Drop N=NAT(S=SNAT, D=DNAT, Ps=SPAT, Pd=DPAT, L=Link Local Port)
 Other:K(nh)=Key_Nexthop, S(nh)=RPF_Nexthop
 Flags:E=Evicted, Ec=Evict Candidate, N=New Flow, M=Modified Dm=Delete Marked
TCP(r=reverse):S=SYN, F=FIN, R=RST, C=HalfClose, E=Established, D=Dead

Listing flows matching ([20.1.1.3]:10000)

    Index Source:Port/Destination:Port Proto(V)
-----------------------------------------------------------------------------------
   212740<=>397732 20.1.1.3:10000 17 (3)
                         40.1.1.8:20000
(Gen: 2, K(nh):25, Action:F, Flags:, S(nh):75, Stats:0/0, SPort 64264)

   397732<=>212740 40.1.1.8:20000 17 (3)
                         20.1.1.3:10000
(Gen: 2, K(nh):25, Action:F, Flags:, S(nh):25, Stats:109389/4594338, SPort 60782)

  2248932 20.1.1.3:10000 17 (3)
                         20.1.1.4:20000
(Gen: 109, K(nh):25, Action:D(Unknown), Flags:MDm, S(nh):0, Stats:89/3738, SPort 0)

  2248933 20.1.1.3:10000 17 (3)
                         20.1.1.4:20000
(Gen: 108, K(nh):25, Action:H, Flags:, S(nh):0, Stats:12/504, SPort 0)

root@nodek2:/var/log/contrail#

root@nodek2:/var/log/contrail# dropstats | grep -v " 0"

Flow Action Drop 4767 <<<<< increments
Flow Queue Limit Exceeded 132619 <<<< increments

Tags: vrouter
tags: added: releasenote
Changed in juniperopenstack:
assignee: Praveen (praveen-karadakal) → Naveen N (naveenn)
Revision history for this message
Naveen N (naveenn) wrote :

In this scenario, floating-ip VRF and native VRF of the interface had same prefix routes which was /32. Upon agent restart when the VM initiated the traffic, floating-ip VRF doesn't have destination route and native-vrf had destination route, hence a normal non-NAT flow was setup.
Now same traffic was initiated from remote VM, upon this packet agent tries to setup a NAT flow,
but the reverse flow is already present in different partition, hence this new flow from remote VM towards native-VM will be always short flow.

Problem happens only if all below conditions are true
1> No default route in floating-ip VRF
2> Same prefix route are present in both floating-ip VRF and native VRF.
3> Timing issue of flow being setup when native VRF has route but not floating-ip VRF

tags: removed: releasenote
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.