occasional connection reset on SNATed after tcp retries

Bug #1804327 reported by Dirk Mueller
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
In Progress
Medium
Brian Haley

Bug Description

When neutron ports are connected to DVR routers that are without floating ip, the traffic is going via SNAT on the network node.

In some cases when the tcp connections that are nat'ed end up retransmitting, sometimes a packet is being retransmitted by the remote that is outside what the Linux kernel connection tracking considers part of valid tcp window. When this happens, the flow is receiving a RST, terminating the connection on the sender side, while leaving the receiver side (the neutron port attached VM) hanging.

A similar issue is described elsewhere, e.g. https://github.com/docker/libnetwork/issues/1090 and the workaround documented there of setting ip_conntrack_tcp_be_liberal seems to help in avoiding conntrack to dismiss packets outside the observed tcp window size which lets the tcp retransmit logic to eventually recover the connection.

Changed in neutron:
assignee: nobody → Dirk Mueller (dmllr)
status: New → In Progress
tags: added: l3-dvr-backlog
Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

this isn't specific to DVR, right?

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Brian Haley (brian-haley) wrote :

https://review.openstack.org/#/c/618208/ was proposed, and it doesn't seem specific to DVR.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

The environment in which this was seen is with DVR routers.

Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: Dirk Mueller (dmllr) → nobody
status: In Progress → New
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.openstack.org/618208
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in neutron:
status: New → Confirmed
Changed in neutron:
status: Confirmed → In Progress
Changed in neutron:
assignee: nobody → Brian Haley (brian-haley)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.