[2.2 TSN use case] BUM traffic re-route takes 90 sec in some scenario

Bug #1479198 reported by Nobuhiko Nagataki
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R2.20
Fix Committed
High
Nipa
Trunk
Fix Committed
High
Nipa

Bug Description

In the following multicast tree for BUM forwarding, IP connectivity to vRouter-1 could be lost due to server uplink down and so on.

TOR -- TSN -- vRouter-1 -- vRouter-2

XMPP hold-time is the way to detect vRouter--1 go away in this scenario.

After 90 sec (XMPP hold-time expire), multicast tree is re-calculated.
Until recalculation, BUM forwarding is dropped for VM connected to vRouter-2.

It is desired to reduce 10 sec from 90sec.

information type: Proprietary → Public
tags: added: bms vrouter
Changed in juniperopenstack:
importance: Undecided → High
assignee: nobody → Hari Prasad Killi (haripk)
tags: added: customer
Nipa (nipak)
Changed in juniperopenstack:
assignee: Hari Prasad Killi (haripk) → Nipa (nipak)
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/12733
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
Nipa (nipak) wrote :

Add tcp socket level timeouts to detect peer down due to connectivity issue, which will hence trigger rebuild of multicast tree

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/12733
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/12934
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/12934
Committed: http://github.org/Juniper/contrail-controller/commit/9a5ac5dbf1545345c9ee7e7473cd4948911f41b8
Submitter: Zuul
Branch: R2.20

commit 9a5ac5dbf1545345c9ee7e7473cd4948911f41b8
Author: Nipa Kumar <email address hidden>
Date: Fri Aug 7 14:44:49 2015 -0700

Enable tcp level timeouts for Xmpp connection.

Enable tcp level timeout to detect when the remote end has crashed or
network connectivity is lost so we could trigger rebuild of
multicast tree.

tcp_hold_time can be set via config file or passed as a parameter
by the daemon. By default tcp hold time is set to 30secs.

This will result in,
o idle-timeout = 10sec (tcp_hold_time/3), no activity time after which
tcp keeaplive probes are triggered
o unack probes count = set to 3, default
o tcp keepalive interval = (tcp_hold_time-idle-timeout)/unack probes = 20/3 = 6secs

o This will trigger a tcp read error, when socket buffer is empty.
o We also need to set tcp-user timeout to detect inactivity when
the socket transmit buffer is full, default = 30s.

Current application level hold-timer will detect peer DOWN after 90secs,
the time taken to to rebuild the multicast tree. With the setting above,
multicast tree builder will be triggered after 30s of inactivity at the
tcp layer.

Change-Id: Idc38f43d3427e6b3f5d68a61bf01be41407cd7a9
Closes-bug:1479198

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/13307
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/13307
Committed: http://github.org/Juniper/contrail-controller/commit/25ff1932df727c3cf2b909e0a6604d03494674b8
Submitter: Zuul
Branch: master

commit 25ff1932df727c3cf2b909e0a6604d03494674b8
Author: Nipa Kumar <email address hidden>
Date: Fri Aug 7 14:44:49 2015 -0700

Enable tcp level timeouts for Xmpp connection.

Enable tcp level timeout to detect when the remote end has crashed or
network connectivity is lost so we could trigger rebuild of
multicast tree.

tcp_hold_time can be set via config file or passed as a parameter
by the daemon. By default tcp hold time is set to 30secs.

This will result in,
o idle-timeout = 10sec (tcp_hold_time/3), no activity time after which
tcp keeaplive probes are triggered
o unack probes count = set to 3, default
o tcp keepalive interval = (tcp_hold_time-idle-timeout)/unack probes = 20/3 = 6secs

o This will trigger a tcp read error, when socket buffer is empty.
o We also need to set tcp-user timeout to detect inactivity when
the socket transmit buffer is full, default = 30s.

Current application level hold-timer will detect peer DOWN after 90secs,
the time taken to to rebuild the multicast tree. With the setting above,
multicast tree builder will be triggered after 30s of inactivity at the
tcp layer.

Change-Id: Idc38f43d3427e6b3f5d68a61bf01be41407cd7a9
Closes-bug:1479198

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/13452
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/13452
Committed: http://github.org/Juniper/contrail-controller/commit/dfff9020cd321d355415528981a4fa5aa34bb930
Submitter: Zuul
Branch: R2.20

commit dfff9020cd321d355415528981a4fa5aa34bb930
Author: Nipa Kumar <email address hidden>
Date: Mon Aug 31 15:36:17 2015 -0700

Unitialized tcp_holdtime for dns config.

Change-Id: I0343c1ac4a9ec88d190b50701bbb0b6b49645501
Closes-bug:1479198

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/13505
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/13505
Committed: http://github.org/Juniper/contrail-controller/commit/cf7948c8047b9dda4695f4e9681c29ef92d09f90
Submitter: Zuul
Branch: R2.20

commit cf7948c8047b9dda4695f4e9681c29ef92d09f90
Author: Nipa Kumar <email address hidden>
Date: Tue Sep 1 23:10:24 2015 -0700

Initialize tcp_hold_time in the constructor of XmppChannelConfig

Change-Id: I05f46af5890dbd439ac382c9c344cc635e7770fe
Closes-bug:1479198

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/13530
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/13530
Committed: http://github.org/Juniper/contrail-controller/commit/b20829338d00c7d5efbc942112a28d676a3c01f5
Submitter: Zuul
Branch: master

commit b20829338d00c7d5efbc942112a28d676a3c01f5
Author: Nipa Kumar <email address hidden>
Date: Tue Sep 1 23:10:24 2015 -0700

Initialize tcp_hold_time in the constructor of XmppChannelConfig

Change-Id: I05f46af5890dbd439ac382c9c344cc635e7770fe
Closes-bug:1479198

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.22-dev

Review in progress for https://review.opencontrail.org/13927
Submitter: Vinay Vithal Mahuli (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/14047
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/14047
Committed: http://github.org/Juniper/contrail-controller/commit/95eaa1e6f4aa36abe8b3cb61b12c4cc0432ded9d
Submitter: Zuul
Branch: R2.20

commit 95eaa1e6f4aa36abe8b3cb61b12c4cc0432ded9d
Author: Nipa Kumar <email address hidden>
Date: Thu Sep 24 16:06:32 2015 -0700

Enable Tcp Keepalive on Xmpp Server. This will not be supported on precise images.

Change-Id: I8e9b04408e2c73833fd7133a33e0e55759de4697
Closes-Bug:1479198

Revision history for this message
Nipa (nipak) wrote :

Release Notes for R2.2

Setting tcp_hold_time to y secs, results in peer DOWN detection anywhere from y sec to 2 times y secs.
Please note default is set to 30secs and the peer DOWN detection can take anywhere from 30sec to 60 secs.

Revision history for this message
Nipa (nipak) wrote :

Release Notes for R2.2

TCP keepalives to detect peer down is applicable for linux kernels after precise.

tags: added: releasenote
Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/14157
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/14157
Committed: http://github.org/Juniper/contrail-controller/commit/42f03d40659ff852427b2928a77c7d51c00ea224
Submitter: Zuul
Branch: master

commit 42f03d40659ff852427b2928a77c7d51c00ea224
Author: Nipa Kumar <email address hidden>
Date: Thu Oct 1 15:36:10 2015 -0700

Enable Tcp keepalive to detect peer DOWN on XmppServer side.

Configured tcp_hold_time = x, we will set the keepalive as x/2 for
x>18secs. Mininum value of x defaults to 9secs.
From experiments it is seen the detection takes twice the time in case
there is a pending data (here XmppKeepalive) in the tcp socket.

Disable support on XmppClient end as the vrouter can take longer to
detect the Xmpp Server is down and hence will retain routes atleast
for 90secs.

Change-Id: Ie20150df6c76b355233a56779a5e4811a2fa6f37
Closes-Bug:1479198

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R2.20

Review in progress for https://review.opencontrail.org/14241
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Review in progress for https://review.opencontrail.org/14242
Submitter: Nipa Kumar (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/14242
Committed: http://github.org/Juniper/contrail-controller/commit/002a9213714a66b1087d0a084724d2dd7933758e
Submitter: Zuul
Branch: R2.20

commit 002a9213714a66b1087d0a084724d2dd7933758e
Author: Nipa Kumar <email address hidden>
Date: Mon Oct 5 15:51:30 2015 -0700

Enable Tcp keepalive to detect peer DOWN on XmppServer side.

Configured tcp_hold_time = x, we will set the keepalive as x/2 for
x>18secs. Mininum value of x defaults to 9secs.
From experiments it is seen the detection takes twice the time in case
there is a pending data (here XmppKeepalive) in the tcp socket.

Disable support on XmppClient end as the vrouter can take longer to
detect the Xmpp Server is down and hence will retain routes atleast
for 90secs.

Change-Id: Ida9581504d2aabea80ef4c78efd9e28186913676
Closes-Bug:1479198

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.