BGPaaS: BGP peering session can be incorrectly RST sometimes

Bug #1551576 reported by amit surana
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Fix Committed
High
Manish Singh
Trunk
Fix Committed
High
Manish Singh

Bug Description

seems to be new issue after fix for:

https://bugs.launchpad.net/juniperopenstack/+bug/1533924

BGP peering session with CN from BGPaaS VM can sometimes be pre-maturely RST. Analyzing the traces, it seems that somewhere mid flow (usually, when BGP KA is sent from the VM), the ACK from the CN gets switched to the host instead of being forwarded to the VM. This causes the host to RST the TCP flow. This pattern keeps repeating itself and the peering is never stable.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/18064
Submitter: Manish Singh (<email address hidden>)

amit surana (asurana-t)
tags: added: releasenote
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/18073
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
amit surana (asurana-t) wrote :

Release-Note:

In multi controller clusters, the BGP peering session between the BGPaaS VM and the control node can sometimes fail to come up after one of the active control-node goes down.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/18073
Committed: http://github.org/Juniper/contrail-controller/commit/25342616d863c06f405512574778d7ed3dd3de5f
Submitter: Zuul
Branch: R3.0

commit 25342616d863c06f405512574778d7ed3dd3de5f
Author: Manish <email address hidden>
Date: Tue Mar 1 17:37:51 2016 +0530

BGP service sessio gets reset intermittently.

Problem:
All BGP as service flows program flows with loose policy. This is to enable flow
lookup on non-tunneled traffic coming from fabric.
Say there is a VM and it has two bgp-sessions to CN1 and CN2. Both the session
will have reverse flow(fabric) which will have same nat-sport and
dport(bgp-port) with different destination IP. For loose policy vrouter programs
this nat-sport to bitmap which it uses to identify fabric traffic for flow
processing. When traffic comes from fabric it checks dport and if it matches to
nat port it has stored in bitmap vrouter pushes it for flow processing else dump
it to host interface.
Now if one session is teared down say CN2 in this case, reverse flow gets
aged out and in turn vrouter removes the nat port from bitmap. However for CN1
this reservation was still needed. In its absence packet coming from CN1 to VM
gets dumped to host interface(even though flow is present).

Solution:
Dont age the flow of bgp service and let it get deleted by config change of bgp
service object or vm interface deletion.

Change-Id: Id7adf0f1f7e2f7a0b3f3e4a68e092107d4edc259
Closes-bug: 1551576

Changed in juniperopenstack:
milestone: r3.1.0.0-fcs → r3.0.1.0
Revision history for this message
amit surana (asurana-t) wrote :
Download full text (19.3 KiB)

bug is still seen:

20:45:26.492004 90:e2:ba:50:b9:68 > 90:e2:ba:50:ad:f8, ethertype IPv4 (0x0800), length 85: 172.16.180.14.50011 > 172.16.180.5.179: Flags [P.], seq 468:487, ack 127, win 17250, options [nop,nop,TS val 2619269 ecr 238062460], length 19: BGP, length: 19
20:45:26.492136 90:e2:ba:50:ad:f8 > 90:e2:ba:50:b9:68, ethertype IPv4 (0x0800), length 66: 172.16.180.5.179 > 172.16.180.14.50011: Flags [.], ack 487, win 235, options [nop,nop,TS val 238069612 ecr 2619269], length 0
20:45:26.492192 90:e2:ba:50:b9:68 > 90:e2:ba:50:ad:f8, ethertype IPv4 (0x0800), length 54: 172.16.180.14.50011 > 172.16.180.5.179: Flags [R], seq 3880033224, win 0, length 0
20:45:27.690994 90:e2:ba:50:b9:68 > 90:e2:ba:50:ad:f8, ethertype IPv4 (0x0800), length 85: 172.16.180.14.50011 > 172.16.180.5.179: Flags [P.], seq 468:487, ack 127, win 17250, options [nop,nop,TS val 2619509 ecr 238062460], length 19: BGP, length: 19
20:45:27.691131 90:e2:ba:50:ad:f8 > 90:e2:ba:50:b9:68, ethertype IPv4 (0x0800), length 60: 172.16.180.5.179 > 172.16.180.14.50011: Flags [R], seq 3765870152, win 0, length 0
20:45:29.891142 90:e2:ba:50:b9:68 > 90:e2:ba:50:ad:f8, ethertype IPv4 (0x0800), length 85: 172.16.180.14.50011 > 172.16.180.5.179: Flags [P.], seq 468:487, ack 127, win 17250, options [nop,nop,TS val 2619949 ecr 238062460], length 19: BGP, length: 19
20:45:29.891261 90:e2:ba:50:ad:f8 > 90:e2:ba:50:b9:68, ethertype IPv4 (0x0800), length 60: 172.16.180.5.179 > 172.16.180.14.50011: Flags [R], seq 3765870152, win 0, length 0
20:45:34.096187 90:e2:ba:50:b9:68 > 90:e2:ba:50:ad:f8, ethertype IPv4 (0x0800), length 85: 172.16.180.14.50011 > 172.16.180.5.179: Flags [P.], seq 468:487, ack 127, win 17250, options [nop,nop,TS val 2620789 ecr 238062460], length 19: BGP, length: 19
20:45:34.096287 90:e2:ba:50:ad:f8 > 90:e2:ba:50:b9:68, ethertype IPv4 (0x0800), length 60: 172.16.180.5.179 > 172.16.180.14.50011: Flags [R], seq 3765870152, win 0, length 0
20:45:37.893431 90:e2:ba:50:b9:68 > 90:e2:ba:4c:68:68, ethertype IPv4 (0x0800), length 78: 172.16.180.14.50011 > 172.16.180.8.179: Flags [S], seq 1228311155, win 16384, options [mss 1420,nop,wscale 0,nop,nop,TS val 2621548 ecr 0,sackOK,eol], length 0
20:45:37.893497 90:e2:ba:4c:68:68 > 90:e2:ba:50:b9:68, ethertype IPv4 (0x0800), length 74: 172.16.180.8.179 > 172.16.180.14.50011: Flags [S.], seq 134096921, ack 1228311156, win 28960, options [mss 1460,sackOK,TS val 237922714 ecr 2621548,nop,wscale 7], length 0
20:45:37.895501 90:e2:ba:50:b9:68 > 90:e2:ba:4c:68:68, ethertype IPv4 (0x0800), length 66: 172.16.180.14.50011 > 172.16.180.8.179: Flags [.], ack 1, win 17376, options [nop,nop,TS val 2621548 ecr 237922714], length 0
20:45:37.900563 90:e2:ba:50:b9:68 > 90:e2:ba:4c:68:68, ethertype IPv4 (0x0800), length 137: 172.16.180.14.50011 > 172.16.180.8.179: Flags [P.], seq 1:72, ack 1, win 17376, options [nop,nop,TS val 2621549 ecr 237922714], length 71: BGP, length: 71
20:45:37.900646 90:e2:ba:4c:68:68 > 90:e2:ba:50:b9:68, ethertype IPv4 (0x0800), length 66: 172.16.180.8.179 > 172.16.180.14.50011: Flags [.], ack 72, win 227, options [nop,nop,TS val 237922715 ecr 2621549], length 0
20:45:37.900930 90:e2:ba:4c:68:68 > 90:e2:ba:50:b9:68, ethertype IPv...

Revision history for this message
Manish Singh (manishs) wrote :

Above fix is a temp one and a different fix will be done.
Nevertheless above fix was avoiding aging of bgp service flows but the check was not avoiding for dead flows.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/19264
Submitter: Divakar Dharanalakota (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/19264
Committed: http://github.org/Juniper/contrail-vrouter/commit/a3aa608942e1e7ee77ea598594295db2f1d883d2
Submitter: Zuul
Branch: R3.0

commit a3aa608942e1e7ee77ea598594295db2f1d883d2
Author: Divakar <email address hidden>
Date: Wed Apr 13 14:55:15 2016 +0530

Dont delete BGP as a service port once set

Right now the BGP As A Service uses the same methodlogy as Link local
services and sets up the flow with link local flag. Once this flag is
set the destination port is added to a bit map by Vrouter and uses when
the packet arrives on Fabric to subject it to Flow. If the flag is
removed from the Flow entry, the port is removed from Bitmap. This
mechanism has a problem if multiple flows use the same destination port
with Link Local flag. Removing the flag from one flow removes the port
from bitmap and because of this packets belonging to other flow never
gets subjected to flow processing as the bitmap does not have the port
any more.

As a temporary fix, a new flag is introduced in flow entry. When this
flag is set, the port is added to bitmap and is never removed from
bitmap even if the flag is removed from flow entry. This way, even if
multiple flows use the same port there would not be any issues.

This fix would be revoked once a new messaging comes between Agent and
Vrouter.

partial-bug: #1551576

Change-Id: I0474a91c3d1275d542f3e4d2ae11bc15f62cdbcf

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/19502
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/19504
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
Manish Singh (manishs) wrote :

Note: Fix for mainline and R3.0 for this bug are different.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/19524
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/19525
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/19504
Committed: http://github.org/Juniper/contrail-controller/commit/33ac3654c992d5a014c0d2f733da41bc0cf00285
Submitter: Zuul
Branch: R3.0

commit 33ac3654c992d5a014c0d2f733da41bc0cf00285
Author: Manish <email address hidden>
Date: Thu Apr 21 11:24:11 2016 +0530

Send BGP flag to retain same nat port across flows.

In BGP as service same nat port is used for different CN peers.
Now if one CN is going down agent will send delete for flow, which in turn will
reset the port still in use by second CN. Now because of this reset packets from
second CN will start going to vhost. This will cause session reset for second CN
and in turn other issues arise.

Solution:
New flag tells vrouter to retain the port for BGP flows even if flow is deleted.

Change-Id: Ib4796f631e3ded3ed814da386cdf182d1c2c9e6b
Closes-bug: #1551576

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/21012
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/21013
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/21013
Committed: http://github.org/Juniper/contrail-vrouter/commit/320cc4d08d71c266c61383eb977350ace6306cc8
Submitter: Zuul
Branch: master

commit 320cc4d08d71c266c61383eb977350ace6306cc8
Author: Divakar <email address hidden>
Date: Wed Apr 13 14:55:15 2016 +0530

Dont delete BGP as a service port once set

Right now the BGP As A Service uses the same methodlogy as Link local
services and sets up the flow with link local flag. Once this flag is
set the destination port is added to a bit map by Vrouter and uses when
the packet arrives on Fabric to subject it to Flow. If the flag is
removed from the Flow entry, the port is removed from Bitmap. This
mechanism has a problem if multiple flows use the same destination port
with Link Local flag. Removing the flag from one flow removes the port
from bitmap and because of this packets belonging to other flow never
gets subjected to flow processing as the bitmap does not have the port
any more.

As a temporary fix, a new flag is introduced in flow entry. When this
flag is set, the port is added to bitmap and is never removed from
bitmap even if the flag is removed from flow entry. This way, even if
multiple flows use the same port there would not be any issues.

This fix would be revoked once a new messaging comes between Agent and
Vrouter.

partial-bug: #1551576

Change-Id: I0474a91c3d1275d542f3e4d2ae11bc15f62cdbcf

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/21070
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged
Download full text (3.2 KiB)

Reviewed: https://review.opencontrail.org/21070
Committed: http://github.org/Juniper/contrail-controller/commit/5f1b41f3ad6e7060ab9e831100ae2a20bd6f6bc6
Submitter: Zuul
Branch: master

commit 5f1b41f3ad6e7060ab9e831100ae2a20bd6f6bc6
Author: Manish Singh <email address hidden>
Date: Fri Jun 10 09:16:16 2016 +0530

Cherry-picked commits from R3.0

Deadlock in agent.

Problem:
It was seen that if a packet is trapped for flow setup with same source IP and
destination IP is classified as nat flow then rflow key becomes similar to flow
key.
This in turn used to enter in deadlock as agent will try to attempt lock both on
flow and rflow located using above keys. (Both will be same).

Mentioned bug exposed this issue.
In bug ICMP TTL expired was received for which agent generated a packet to
switch to VM-port from where packet with TTL 1 was generated. This packet
resulted in flow trap and agent processed the same. (agent packet was
generated
with
same
SIP
and
DIP).
Deadlock results in flows going to hold state.

This fix does not cover agent generating packet with same SIP and DIP and
flow trap for agent generated packet.
It only fixes deadlock.

Solution:
In packet flow when flow and rflow keys are created, match them and if they are
same, nullify rflow and mark flow as short flow.

Partial-bug: #1556290

Conflicts:
src/vnsw/agent/pkt/flow_table.cc

Conflicts:
src/vnsw/agent/pkt/flow_entry.cc
src/vnsw/agent/pkt/flow_entry.h
src/vnsw/agent/pkt/flow_table.cc
src/vnsw/agent/vrouter/ksync/flowtable_ksync.cc

BGP service sessio gets reset intermittently.

Problem:
All BGP as service flows program flows with loose policy. This
is to enable flow
lookup on non-tunneled traffic coming from fabric.
Say there is a VM and it has two bgp-sessions to CN1 and CN2.
Both the session
will have reverse flow(fabric) which will have same nat-sport
and
dport(bgp-port) with different destination IP. For loose policy
vrouter programs
this nat-sport to bitmap which it uses to identify fabric
traffic for flow
processing. When traffic comes from fabric it checks dport and
if it matches to
nat port it has stored in bitmap vrouter pushes it for flow
processing else dump
it to host interface.
Now if one session is teared down say CN2 in this case, reverse
flow gets
aged out and in turn vrouter removes the nat port from bitmap.
However for CN1
this reservation was still needed. In its absence packet coming
from CN1 to VM
gets dumped to host interface(even though flow is present).

Solution:
Dont age the flow of bgp service and let it get deleted by
config change of bgp
service object or vm interface deletion.

Closes-bug: 1551576

Conflicts:
src/vnsw/agent/vrouter/flow_stats/flow_stats_collector.cc

Send BGP flag to retain same nat port across flows.

In BGP as service same nat port is used for different CN
peers.
Now if one CN is going down agent will send delete for flow,
which in turn will
reset the port still in use by second CN. Now because of
this reset packets from
second CN will start going to vhost. This will cause session
reset for second CN
and in turn other issues arise.

Solution:
New flag tells vrouter to retain the port for BGP flows even
if flow is deleted.

...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.