The TCP keepalive timeouts in pods running on the cluster
network are currently set to the following:
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
This means that a dropped TCP connection can take more than
two hours to be removed. That can cause large delays in reacting
to unexpected events like the uncontrolled reboot of a host.
This commit changes the TCP keepalive timeouts for the cluster
network to match the timeouts for the host OS:
net.ipv4.tcp_keepalive_intvl = 1
net.ipv4.tcp_keepalive_probes = 5
net.ipv4.tcp_keepalive_time = 5
Reviewed: https:/ /review. opendev. org/670822 /git.openstack. org/cgit/ starlingx/ ansible- playbooks/ commit/ ?id=9a4b6b6a5d9 03482624f2f4b86 041511d3dfa7e4
Committed: https:/
Submitter: Zuul
Branch: master
commit 9a4b6b6a5d90348 2624f2f4b860415 11d3dfa7e4
Author: Bart Wensley <email address hidden>
Date: Mon Jul 15 07:03:46 2019 -0500
Set TCP keepalive timeouts for cluster network
The TCP keepalive timeouts in pods running on the cluster ipv4.tcp_ keepalive_ intvl = 75 ipv4.tcp_ keepalive_ probes = 9 ipv4.tcp_ keepalive_ time = 7200
network are currently set to the following:
net.
net.
net.
This means that a dropped TCP connection can take more than
two hours to be removed. That can cause large delays in reacting
to unexpected events like the uncontrolled reboot of a host.
This commit changes the TCP keepalive timeouts for the cluster ipv4.tcp_ keepalive_ intvl = 1 ipv4.tcp_ keepalive_ probes = 5 ipv4.tcp_ keepalive_ time = 5
network to match the timeouts for the host OS:
net.
net.
net.
Change-Id: I23e2c9a733727e 4059ac272e052dc a0e6ec4f2e1
Closes-bug: 1836232
Signed-off-by: Bart Wensley <email address hidden>