nf_conntrack table fills on swift nodes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack-Ansible |
Invalid
|
Undecided
|
Unassigned | ||
Juno |
Fix Released
|
Medium
|
Christopher H. Laco | ||
Trunk |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Due to the use of natting in iptables required by LXC, it appears that the conntrack table fills up and then randomly drops packets. This might manifest itself if you are using swift-recon in order to monitor your cluster, you will intermittently see timeouts, like so:
[2015-07-28 21:33:45] Checking swift.conf md5sum
-> http://
4/5 hosts matched, 1 error[s] while checking hosts.
Upon investigating, I would see dmesg filled with the following:
[20765760.747582] nf_conntrack: table full, dropping packet
[20765761.251622] nf_conntrack: table full, dropping packet
[20765762.067443] nf_conntrack: table full, dropping packet
[20765762.067595] nf_conntrack: table full, dropping packet
[20765762.068828] nf_conntrack: table full, dropping packet
[20765762.070060] nf_conntrack: table full, dropping packet
[20765762.070393] nf_conntrack: table full, dropping packet
[20765762.070632] nf_conntrack: table full, dropping packet
[20765762.070847] nf_conntrack: table full, dropping packet
I have seen this issue in a couple different environments, in all cases I have raised the nf_conntrack_max value to a sufficiently large value (Around 300,000 in the case of these relatively small environments), and then committed it to /etc/sysctl.conf to prevent it from reverting on a server restart.
Maybe we should raise this value, or paramatize it so that it is easier to manipulate across larger environments?
See also https:/ /bugs.launchpad .net/openstack- ansible/ +bug/1441363