Configuring md5 on control brings BGP down

Bug #1691071 reported by Shashikiran H
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R4.0
Fix Released
Critical
Ignatious Johnson Christopher
Trunk
Fix Committed
Critical
Ignatious Johnson Christopher

Bug Description

Version: 4.0.0.0-5 mitaka

Topo:
host1 = 'root@10.204.216.95'
host2 = 'root@10.204.216.96'
host3 = 'root@10.204.216.97'
host4 = 'root@10.204.216.98'
host5 = 'root@10.204.216.99'
host6 = 'root@10.204.216.103'

router_asn = 64510
env.roledefs = {
    'all': [host1, host2, host3, host4, host5, host6],
    'contrail-controller': [host6, host2, host1],
    'contrail-analytics': [host6, host2, host1],
    'contrail-analyticsdb': [host6, host2, host1],
    'openstack': [host6, host2, host1],
    'contrail-compute': [host3, host4],
    'contrail-lb': [host5],
    'build': [host_build]
}

env.hostnames = {
    'all': ['nodem6', 'nodem7', 'nodem8', 'nodem9', 'nodem10', 'nodem14']
}

Configuring global or per peer md5 values between BGP control peers brings BGP down.
The md5 values are set correctly in BGP router:
    "bgp-router": {
        "bgp_router_parameters": {
            "address": "10.10.10.14",
            "address_families": {
                "family": [
                    "route-target",
                    "inet-vpn",
                    "e-vpn",
                    "erm-vpn",
                    "inet6-vpn"
                ]
            },
            "admin_down": false,
            "auth_data": {
                "key_items": [
                    {
                        "key": "qwe",
                        "key_id": 0
                    }
                ],
                "key_type": "md5"
            },

The flap messages pop up correctly as expected. I have attached control logs and sandesh messages. I am only seeing this on multi node openstack HA setup and not on a non HA setup.

Revision history for this message
Shashikiran H (skiranh) wrote :
Revision history for this message
Shashikiran H (skiranh) wrote :
Shashikiran H (skiranh)
description: updated
description: updated
tags: added: sanity
Jeba Paulaiyan (jebap)
tags: added: blocker
Revision history for this message
Ananth Suryanarayana (anantha-l) wrote :

Issue is with tcp_tw_recycle set. This is not recommended in many scenarios. This is affecting time-stamps, which in turn is causing kernel to drop the SYN packets

    13432 SYNs to LISTEN sockets dropped

Disabling tcp_tw_recycle solved the issue. This needs to be added to provisioning. (or to reimage...)

Following command helped to resolve the issue vindicating above points.
sysctl -w net.ipv4.tcp_tw_recycle=0; service contrail-control restart

https://serverfault.com/questions/583488/no-response-to-some-syn-packets-when-timestamps-are-enabled

Revision history for this message
Ananth Suryanarayana (anantha-l) wrote :

From "man tcp"

       tcp_tw_recycle (Boolean; default: disabled; since Linux 2.4)
              Enable fast recycling of TIME_WAIT sockets. Enabling this option is not recommended since
              this causes problems when working with NAT (Network Address Translation).

This option should not be enabled

Revision history for this message
Ananth Suryanarayana (anantha-l) wrote :
Revision history for this message
Ananth Suryanarayana (anantha-l) wrote :

Can you please remove this option setting from fab, puppet, ansible, etc. This is not recommended anyways, to turn on anyways. That should fix this issue

diff --git a/fabfile/tasks/ha.py b/fabfile/tasks/ha.py
index 4c74b3c..315ba5b 100644
--- a/fabfile/tasks/ha.py
+++ b/fabfile/tasks/ha.py
@@ -90,8 +90,6 @@ def tune_tcp_node(*args):
                 sudo('echo "net.netfilter.nf_conntrack_tcp_timeout_time_wait = 30" >> /
             if sudo("grep '^net.ipv4.tcp_syncookies' /etc/sysctl.conf").failed:
                 sudo('echo "net.ipv4.tcp_syncookies = 1" >> /etc/sysctl.conf')
- if sudo("grep '^net.ipv4.tcp_tw_recycle' /etc/sysctl.conf").failed:
- sudo('echo "net.ipv4.tcp_tw_recycle = 1" >> /etc/sysctl.conf')
             if sudo("grep '^net.ipv4.tcp_tw_reuse' /etc/sysctl.conf").failed:
                 sudo('echo "net.ipv4.tcp_tw_reuse = 1" >> /etc/sysctl.conf')
             if sudo("grep '^net.ipv4.tcp_fin_timeout' /etc/sysctl.conf").failed:

Nischal Sheth (nsheth)
tags: removed: contrail-control
tags: added: provisioning
Revision history for this message
Abhay Joshi (abhayj) wrote :

This needs to be taken care in internal ansible by controller container team.

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] master

Review in progress for https://review.opencontrail.org/31795
Submitter: Ignatious Johnson Christopher (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : [Review update] R4.0

Review in progress for https://review.opencontrail.org/31796
Submitter: Ignatious Johnson Christopher (<email address hidden>)

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote : A change has been merged

Reviewed: https://review.opencontrail.org/31795
Committed: http://github.com/Juniper/contrail-puppet/commit/6fb97e784acfc928227cee8500ece0ad17f1c367
Submitter: Zuul (<email address hidden>)
Branch: master

commit 6fb97e784acfc928227cee8500ece0ad17f1c367
Author: Ignatious Johnson Christopher <email address hidden>
Date: Thu May 18 11:21:52 2017 -0700

Disabling fast recycling of time wait sockets

which is not recommended to be enabled.

Change-Id: I5c354ec2174137f44b9bb4469d092b225bbcbf0b
Closes-Bug: 1691071

Revision history for this message
OpenContrail Admin (ci-admin-f) wrote :

Reviewed: https://review.opencontrail.org/31796
Committed: http://github.com/Juniper/contrail-puppet/commit/4d20cd5d2740705750db32dba365bb5c4a96d4bb
Submitter: Zuul (<email address hidden>)
Branch: R4.0

commit 4d20cd5d2740705750db32dba365bb5c4a96d4bb
Author: Ignatious Johnson Christopher <email address hidden>
Date: Thu May 18 11:21:52 2017 -0700

Disabling fast recycling of time wait sockets

which is not recommended to be enabled.

Change-Id: I5c354ec2174137f44b9bb4469d092b225bbcbf0b
Closes-Bug: 1691071

Revision history for this message
Shashikiran H (skiranh) wrote :

Verified on 4.0.0.0-14

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.