Packets getting lost during SNAT with too many connections using the same source and destination on Network Node
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Undecided
|
Brian Haley |
Bug Description
Probably we have a problem with SNAT, with too many connections using the same source / destination, on the network nodes.
We have reproduced the bug with DNS requests, but we assume that it affects other packages as well.
When we send a lot of DNS requests, we see that sometimes a packet does not pass through the NAT and simply "gets lost".
In addition, we can see in the conntrack table that the who "insert_failed" increases.
ip netns exec snat-848819dc-
cpu=0 searched=1166140 found=5587918 new=6659 invalid=5 ignore=0 delete=27726 delete_list=27712 insert=6645 insert_failed=14 drop=0 early_drop=0 error=0 search_restart=0
cpu=2 searched=12015 found=64626 new=2467 invalid=0 ignore=0 delete=15205 delete_list=15204 insert=2466 insert_failed=1 drop=0 early_drop=0 error=0 search_restart=0
cpu=3 searched=1348502 found=6097345 new=4093 invalid=0 ignore=0 delete=23200 delete_list=23173 insert=4066 insert_failed=27 drop=0 early_drop=0 error=0 search_restart=0
cpu=4 searched=1068516 found=5398514 new=3299 invalid=0 ignore=0 delete=14144 delete_list=14126 insert=3281 insert_failed=18 drop=0 early_drop=0 error=0 search_restart=0
cpu=5 searched=2280948 found=9908854 new=6770 invalid=0 ignore=0 delete=17224 delete_list=17185 insert=6731 insert_failed=39 drop=0 early_drop=0 error=0 search_restart=0
cpu=6 searched=1123341 found=5264368 new=9749 invalid=0 ignore=0 delete=17272 delete_list=17247 insert=9724 insert_failed=25 drop=0 early_drop=0 error=0 search_restart=0
cpu=7 searched=1553934 found=7234262 new=8734 invalid=0 ignore=0 delete=15658 delete_list=15634 insert=8710 insert_failed=24 drop=0 early_drop=0 error=0 search_restart=0
This might be a generic problem with conntrack and linux.
We suspect that we encounter the following "limitation / bug" in the kernel:
https:/
There seems to be a workaround to alleviate this behavior by setting the -random-fully flag in iptables. Unfortunately, this is only available since iptables 1.6.2.
Also this is not currently supported in neutron for the SNAT rules, it just uses the --to-source.
Changed in neutron: | |
assignee: | Swaminathan Vasudevan (swaminathan-vasudevan) → Slawek Kaplonski (slaweq) |
Changed in neutron: | |
assignee: | Slawek Kaplonski (slaweq) → Swaminathan Vasudevan (swaminathan-vasudevan) |
Changed in neutron: | |
assignee: | Swaminathan Vasudevan (swaminathan-vasudevan) → Brian Haley (brian-haley) |
Changed in neutron: | |
assignee: | Brian Haley (brian-haley) → Swaminathan Vasudevan (swaminathan-vasudevan) |
Changed in neutron: | |
assignee: | Swaminathan Vasudevan (swaminathan-vasudevan) → Brian Haley (brian-haley) |
tags: | added: neutron-proactive-backport-potential |
tags: | removed: neutron-proactive-backport-potential |
This iptables patch might be required to fix this issue if you don't have the right iptables version that supports it.
https:/ /git.netfilter. org/iptables/ commit/ ?id=8b0da2130b8 af3890ef20afb23 05f11224bb39ec