BGPaaS: flow setup issues at scale

Bug #1664301 reported by amit surana
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
In Progress
High
Manish Singh
R3.0.3.x
Fix Committed
High
Manish Singh
R3.1
Fix Committed
High
Manish Singh
R3.2
Fix Committed
High
Manish Singh
Trunk
Fix Committed
High
Manish Singh

Bug Description

Seeing a few flow setup issues when BGPaaS sessions were scaled to 2k. They might be related and have same root cause so tracking all of them under this bug for now:

Symptom 1:

1. Lets assume BGPaaS sessions are established to control-nodes CN1 and CN2 (CN3 is down).
2. If CN1 is restarted, then some flows to CN2 are also RST. Further, it takes some time (anywhere between 2-10 minutes) for all the 2k sessions to come back up.

Symptom 2:

If the BGP process inside the VM that is initiating the BGPaaS sessions (1k such sessions), is restarted multiple times, then a few hundred TCP flows get 'stuck' in vRouter. These flows are not getting evicted even after being idle for longer than the configured idle timeout.

Symptom 3:

Lets say there are some BGPaaS flows stuck in vRouter (as seen in symptom 2). If the guest now attempts to open a new TCP connection, it is seen that the initial 3 whs completes successfully with the control-node, and some data is exchanged as well, but then soon after the flow is removed from the vRouter. The next tcp segment sent by the control-node thus causes the compute send a RST towards the control-node, and the flow has to be re-setup by the guest.

Symptom 4 (not scale related):

Lets say the guest has established 2 BGP connections: one each to the .1 and .2 IPs. Both the sessions are up. Now, if a new SYN is sent from the guest to the .1 IP, then the .2 flow is removed and that connection gets RST.

Tags: bgpaas vrouter
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/28955
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/28955
Committed: http://github.org/Juniper/contrail-controller/commit/e7f8716ba29f709021e4f74c34ddc54801037700
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit e7f8716ba29f709021e4f74c34ddc54801037700
Author: Manish <email address hidden>
Date: Tue Feb 21 07:18:04 2017 +0530

Dont use link local flag in bgp-aas flow encode.

Presence of BGP service flag is indication of bgp-aas flows. Link local is not
needed to identify candidate for relaxed policy. Bgp flags also indicates
relaxed policy on fabric.
Presence of this flag used to cause port to be freed on flow delete even though
there will be a second flow using this port. Vrouter uses link local flag to
release port. Removing this flag ensures that vrouter deos not release the port
used in bgp-aas.
In case flow is not present and port is still part of relaxed policy then packet
will be sent to host, hence achieving the functionailty when port would have
been released.

Change-Id: Id6cfd2d0b0ff3dc0724715bc126ccee882068486
Closes-bug: #1664301

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/29069
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/29070
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/29070
Committed: http://github.org/Juniper/contrail-controller/commit/0dcdb10ceeb324f4da191f0e983972d166ceeeaf
Submitter: Zuul (<email address hidden>)
Branch: master

commit 0dcdb10ceeb324f4da191f0e983972d166ceeeaf
Author: Manish <email address hidden>
Date: Tue Feb 21 07:18:04 2017 +0530

Dont use link local flag in bgp-aas flow encode.

Presence of BGP service flag is indication of bgp-aas flows. Link local is not
needed to identify candidate for relaxed policy. Bgp flags also indicates
relaxed policy on fabric.
Presence of this flag used to cause port to be freed on flow delete even though
there will be a second flow using this port. Vrouter uses link local flag to
release port. Removing this flag ensures that vrouter deos not release the port
used in bgp-aas.
In case flow is not present and port is still part of relaxed policy then packet
will be sent to host, hence achieving the functionailty when port would have
been released.

Change-Id: Id6cfd2d0b0ff3dc0724715bc126ccee882068486
Closes-bug: #1664301
(cherry picked from commit e7f8716ba29f709021e4f74c34ddc54801037700)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/29069
Committed: http://github.org/Juniper/contrail-controller/commit/f48937c6905034ae25dfbf4c3df509072cafacaf
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit f48937c6905034ae25dfbf4c3df509072cafacaf
Author: Manish <email address hidden>
Date: Tue Feb 21 07:18:04 2017 +0530

Dont use link local flag in bgp-aas flow encode.

Presence of BGP service flag is indication of bgp-aas flows. Link local is not
needed to identify candidate for relaxed policy. Bgp flags also indicates
relaxed policy on fabric.
Presence of this flag used to cause port to be freed on flow delete even though
there will be a second flow using this port. Vrouter uses link local flag to
release port. Removing this flag ensures that vrouter deos not release the port
used in bgp-aas.
In case flow is not present and port is still part of relaxed policy then packet
will be sent to host, hence achieving the functionailty when port would have
been released.

Change-Id: Id6cfd2d0b0ff3dc0724715bc126ccee882068486
Closes-bug: #1664301
(cherry picked from commit e7f8716ba29f709021e4f74c34ddc54801037700)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/29091
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0.3.x

Review in progress for https://review.opencontrail.org/29092
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.2

Review in progress for https://review.opencontrail.org/29101
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/29091
Committed: http://github.org/Juniper/contrail-controller/commit/52273ebc1a8bd46492f20f1535e60d6144df2bb2
Submitter: Zuul (<email address hidden>)
Branch: R3.0

commit 52273ebc1a8bd46492f20f1535e60d6144df2bb2
Author: Manish <email address hidden>
Date: Tue Feb 21 07:18:04 2017 +0530

Dont use link local flag in bgp-aas flow encode.

Presence of BGP service flag is indication of bgp-aas flows. Link local is not
needed to identify candidate for relaxed policy. Bgp flags also indicates
relaxed policy on fabric.
Presence of this flag used to cause port to be freed on flow delete even though
there will be a second flow using this port. Vrouter uses link local flag to
release port. Removing this flag ensures that vrouter deos not release the port
used in bgp-aas.
In case flow is not present and port is still part of relaxed policy then packet
will be sent to host, hence achieving the functionailty when port would have
been released.

Change-Id: Id6cfd2d0b0ff3dc0724715bc126ccee882068486
Closes-bug: #1664301
(cherry picked from commit e7f8716ba29f709021e4f74c34ddc54801037700)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/29101
Committed: http://github.org/Juniper/contrail-controller/commit/517775d18e4f734910d32dac3a70d9ec1f1d24a6
Submitter: Zuul (<email address hidden>)
Branch: R3.2

commit 517775d18e4f734910d32dac3a70d9ec1f1d24a6
Author: Manish <email address hidden>
Date: Fri Feb 24 16:37:30 2017 +0530

Set action to drop if its a short flow.

To handle the cases where TrafficAction gets overridden after being marked as
drop, use short flow flag as well to identify drop action.

Change-Id: I45e98836c5463a9913f2a9bdb9aeedc2643f6c2c
Closes-bug: #1664301

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/29092
Committed: http://github.org/Juniper/contrail-controller/commit/99fbd41081d09e2a2186a92bd57b7d4c696ea27e
Submitter: Zuul (<email address hidden>)
Branch: R3.0.3.x

commit 99fbd41081d09e2a2186a92bd57b7d4c696ea27e
Author: Manish <email address hidden>
Date: Tue Feb 21 07:18:04 2017 +0530

Dont use link local flag in bgp-aas flow encode.

Presence of BGP service flag is indication of bgp-aas flows. Link local is not
needed to identify candidate for relaxed policy. Bgp flags also indicates
relaxed policy on fabric.
Presence of this flag used to cause port to be freed on flow delete even though
there will be a second flow using this port. Vrouter uses link local flag to
release port. Removing this flag ensures that vrouter deos not release the port
used in bgp-aas.
In case flow is not present and port is still part of relaxed policy then packet
will be sent to host, hence achieving the functionailty when port would have
been released.

Conflicts:
 src/vnsw/agent/vrouter/ksync/flowtable_ksync.cc

Change-Id: Id6cfd2d0b0ff3dc0724715bc126ccee882068486
Closes-bug: #1664301

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/29214
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/29215
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/29214
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/29215
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0.3.x

Review in progress for https://review.opencontrail.org/29969
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/29970
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/29215
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.1

Review in progress for https://review.opencontrail.org/29214
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/29215
Committed: http://github.org/Juniper/contrail-controller/commit/8bede1e85b5ee20c1d29a882fb289010dbb526cf
Submitter: Zuul (<email address hidden>)
Branch: master

commit 8bede1e85b5ee20c1d29a882fb289010dbb526cf
Author: Manish <email address hidden>
Date: Fri Feb 24 16:37:30 2017 +0530

Set action to drop if its a short flow.

To handle the cases where TrafficAction gets overridden after being marked as
drop, use short flow flag as well to identify drop action.

Change-Id: I45e98836c5463a9913f2a9bdb9aeedc2643f6c2c
Closes-bug: #1664301
(cherry picked from commit 517775d18e4f734910d32dac3a70d9ec1f1d24a6)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/29214
Committed: http://github.org/Juniper/contrail-controller/commit/32bd7220e808048c3c03c2a783a3bf96f16c14fe
Submitter: Zuul (<email address hidden>)
Branch: R3.1

commit 32bd7220e808048c3c03c2a783a3bf96f16c14fe
Author: Manish <email address hidden>
Date: Fri Feb 24 16:37:30 2017 +0530

Set action to drop if its a short flow.

To handle the cases where TrafficAction gets overridden after being marked as
drop, use short flow flag as well to identify drop action.

Change-Id: I45e98836c5463a9913f2a9bdb9aeedc2643f6c2c
Closes-bug: #1664301
(cherry picked from commit 517775d18e4f734910d32dac3a70d9ec1f1d24a6)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0.3.x

Review in progress for https://review.opencontrail.org/31187
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0

Review in progress for https://review.opencontrail.org/31188
Submitter: Manish Singh (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R3.0.3.x

Review in progress for https://review.opencontrail.org/31187
Submitter: Hari Prasad Killi (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/31187
Committed: http://github.com/Juniper/contrail-controller/commit/4f55e04e01062124db704bcb938071f9418d913d
Submitter: Zuul (<email address hidden>)
Branch: R3.0.3.x

commit 4f55e04e01062124db704bcb938071f9418d913d
Author: Manish <email address hidden>
Date: Fri Feb 24 16:37:30 2017 +0530

Set action to drop if its a short flow.

To handle the cases where TrafficAction gets overridden after being marked as
drop, use short flow flag as well to identify drop action.

Closes-bug: #1664301

Conflicts:
 src/vnsw/agent/pkt/flow_entry.cc

Change-Id: I88a8647bad0b29622f214eb534ac2fff660c5dbb

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.