Neutron DVR(SNAT) steals FIP traffic

Bug #1620824 reported by David-wahlstrom
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Brian Haley

Bug Description

Setup:

We have 40+ compute nodes, all running neutron-l3-agent in DVR mode. We also have 1 node running neutron-l3-agent in DVR_SNAT mode. L2 population is happening with VXFLD (https://github.com/CumulusNetworks/vxfld).

Steps to reproduce:

After following the setup above, we noticed that traffic going to/from a floating IP was randomly going out the SNAT namespace (and thus getting connection resets). Further investigation showed this was related/correlated to traffic load, meaning, the more traffic, the more likely the return path would go out the SNAT namespace instead of back out the FIP namespace. After some searching, we found that conntrack was marking in-transit connections as "new" connections (losing their state, essentially) and thus the SNAT namespace would see this as new traffic and setup a new return path.

Changed in neutron:
assignee: nobody → David-wahlstrom (david-wahlstrom)
status: New → In Progress
Revision history for this message
Jeremy Hanmer (fzylogic) wrote :

To add more to this, what we believe is happening is that under heavy load we see single packets occasionally flood all ports of a bridge (as would also happen under normal circumstances should an L3 adjacency age out). When that single packet floods, it hits the vxlan interface and is eventually forwarded on to the SNAT server where it is happily forwarded along to the client endpoint. When the client receives this packet (which is sourced from the backup SNAT IP address, rather than the floating IP which the client has been talking to all along), it sends a TCP RST packet, effectively terminating the in-progress TCP flow. Neutron uses connection tracking to drop INVALID packets, but because of the default conntrack behavior of automatically creating connection tracking entries for anything that looks like an active connection, those rules are nearly always bypassed.

Revision history for this message
John Schwarz (jschwarz) wrote :

I wonder if this reproduces using the reference implementation of l2pop?

tags: added: l3-dvr-backlog
Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Thanks for your detailed report.

Revision history for this message
Brian Haley (brian-haley) wrote :

I have looked at the patch, https://review.openstack.org/#/c/366297/ but I think the problem is that changing to tcp_loose=0 will cause network node failover to stop working. I know with Floating IP traffic this setting would be an issue, but don't know if it will present as much as a problem with SNAT traffic.

For example, if there was a connection from the VM using the SNAT IP and the router was moved to another l3-agent, the connection would drop with this change. It should continue to work with the existing setting.

Revision history for this message
Jeremy Hanmer (fzylogic) wrote :

We're not using HA for the SNAT stuff yet, but my understanding from looking at the code/configs was that people should be deploying with keepalived/conntrackd. If conntrackd is being used, tcp_loose=0 won't break anything as the conntrack state will be kept in sync anyway. If that assumption is wrong, I suppose making this behavior tunable via a new config option would be the best approach?

Revision history for this message
Brian Haley (brian-haley) wrote :

I'm not even talking about HA traffic at this point, just regular traffic - you should be able to move a tenant router between l3-agents without dropping an open connection. I don't believe there is any functional test to verify that, so changing to tcp_loose=0 could cause a regression.

Also, has this been verified without vxfld, just using the neutron l2pop driver?

Revision history for this message
David-wahlstrom (david-wahlstrom) wrote :

We have not tested this with the l2pop driver, only using VXFLD. We have had great success in our deployment using VXFLD with the linked patch. As mentioned above, maybe I should make this something that can be toggled with a config option?

Revision history for this message
David-wahlstrom (david-wahlstrom) wrote :

To reproduce easily:
Ping a VM IP from a remote router namespace (the more frequent the pings, the more complete the outage -- ping flood == doom). FIP traffic will be routed through SNAT.

A more real-world, but much more time consuming test is to simply generate a ton of inter-VM (and inter-hypervisor) traffic while monitoring for traffic migration.

Changed in neutron:
assignee: David-wahlstrom (david-wahlstrom) → Brian Haley (brian-haley)
Revision history for this message
Kevin Benton (kevinbenton) wrote :

@Brian, I think connections for SNAT are going to get interrupted anyway since we lose the port mapping translation that's held in conntrack.

Revision history for this message
Brian Haley (brian-haley) wrote :

Kevin - yeah, think you're right if you're talking default SNAT, I was always doing testing to floating IPs, which can get picked-up if moved between routers.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/366297
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=299d08ed3f3f170a129fb2096df73fd5af7e647d
Submitter: Jenkins
Branch: master

commit 299d08ed3f3f170a129fb2096df73fd5af7e647d
Author: David Wahlstrom <email address hidden>
Date: Tue Sep 6 12:11:41 2016 -0700

    DVR: properly track SNAT traffic

    When running DVR, it's possible for traffic to get confused and sent
    through SNAT thanks to the way conntrack tracks "new" connections. This
    patch sets "nf_connctrack_tcp_loose" inside the SNAT namespace to more
    intelligently handle SNAT traffic (and ignore what should be FIP
    traffic) - basically, don't track a connection where we didn't
    see the initial SYN.

    https://www.kernel.org/doc/Documentation/networking/nf_conntrack-sysctl.txt

    Change-Id: Ia5b8bd3794d22808ee1718d429f0bbdbe61e94ec
    Closes-Bug: 1620824

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.0.0b2

This issue was fixed in the openstack/neutron 11.0.0.0b2 development milestone.

tags: added: neutron-proactive-backport-potential
tags: added: neutron-easy-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/474297

tags: removed: neutron-easy-proactive-backport-potential neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ocata)

Reviewed: https://review.openstack.org/474297
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=185116fd9df9b8d15b8de97c0ae89cb7be55639a
Submitter: Jenkins
Branch: stable/ocata

commit 185116fd9df9b8d15b8de97c0ae89cb7be55639a
Author: David Wahlstrom <email address hidden>
Date: Tue Sep 6 12:11:41 2016 -0700

    DVR: properly track SNAT traffic

    When running DVR, it's possible for traffic to get confused and sent
    through SNAT thanks to the way conntrack tracks "new" connections. This
    patch sets "nf_connctrack_tcp_loose" inside the SNAT namespace to more
    intelligently handle SNAT traffic (and ignore what should be FIP
    traffic) - basically, don't track a connection where we didn't
    see the initial SYN.

    https://www.kernel.org/doc/Documentation/networking/nf_conntrack-sysctl.txt

    Change-Id: Ia5b8bd3794d22808ee1718d429f0bbdbe61e94ec
    Closes-Bug: 1620824
    (cherry picked from commit 299d08ed3f3f170a129fb2096df73fd5af7e647d)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 10.0.3

This issue was fixed in the openstack/neutron 10.0.3 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.