neutron-l3-agent virtual router SNAT translation doesn't work for traffic happening during iptable rules setup (race condition)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Expired
|
Medium
|
Unassigned |
Bug Description
I found a race condition that happens in the following situation:
1) A network node running neutron-l3-agent with actual traffic is rebooted
2) While it starts again, an VM is sending traffic (ping is a simple case) to external network
3) As it starts, it creates the virtual router qrouter-<ID> namespace, brings up the interfaces (ext+int),
and setups the iptable rules.
4) if traffic hits the rules, before the SNAT rule is set, the linux
connection tracker won't ever toss those packets anymore by the
SNAT/DNAT rule (even if is set after). So it will result from the internal IP being forwarded "as is", untranslated, into the external network.
5) If you restart the ping in the VM (ping seq restarts to 0), it will start working
6) If you start a different ping while the first one is running, the new ping will work, the old will
stay in that "limbo state" where it's untranslated.
Aditional information:
This is the normal condition, where a race condition didn't happen: http://
This is the abnormal condition, where the race condition happened: http://
This is the abnormal condition, where we started a new ping to a different host: http://
tags: |
added: l3-ipam-dhcp removed: condition ha iptables race |
Changed in neutron: | |
importance: | Undecided → Medium |
status: | New → Confirmed |
I believe we could mitigate this race condition in different ways:
1) Invert the order during qrouter setup:
a) first, set the iptable rules
b) then, bring up the interfaces
This way, the iptable rules will start processing packets once they are all in place
2) Set a DROP rule first, for traffic, then set the actual rules, then remove this DROP barrier
(not sure if it really mitigates the situation).
3) use conntrack to clear the kernel connection tracking tables after rules setup
(this could reset any NAT'd connection between the rules set, and the conntrack clear)