StarlingX

Bug #1836232
Comment #2

Comment 2 for bug 1836232

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-15: Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/670822
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=9a4b6b6a5d903482624f2f4b86041511d3dfa7e4
Submitter: Zuul
Branch: master

commit 9a4b6b6a5d903482624f2f4b86041511d3dfa7e4
Author: Bart Wensley <email address hidden>
Date: Mon Jul 15 07:03:46 2019 -0500

Set TCP keepalive timeouts for cluster network

    The TCP keepalive timeouts in pods running on the cluster
    network are currently set to the following:
    net.ipv4.tcp_keepalive_intvl = 75
    net.ipv4.tcp_keepalive_probes = 9
    net.ipv4.tcp_keepalive_time = 7200

    This means that a dropped TCP connection can take more than
    two hours to be removed. That can cause large delays in reacting
    to unexpected events like the uncontrolled reboot of a host.

    This commit changes the TCP keepalive timeouts for the cluster
    network to match the timeouts for the host OS:
    net.ipv4.tcp_keepalive_intvl = 1
    net.ipv4.tcp_keepalive_probes = 5
    net.ipv4.tcp_keepalive_time = 5

    Change-Id: I23e2c9a733727e4059ac272e052dca0e6ec4f2e1
    Closes-bug: 1836232
    Signed-off-by: Bart Wensley <email address hidden>