nf_conntrack schould be unloaded on swift object server

Bug #1441363 reported by Bjoern
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Medium
Andy McCrae
Juno
Fix Released
Medium
Andy McCrae
Kilo
Fix Released
Medium
Andy McCrae
Trunk
Fix Released
Medium
Andy McCrae

Bug Description

I did notice a lot of TCP sessions around port 6000/6001 in a TIME_WAIT state causing nf_conntrack to be violated.
Ideally we turn off connection tracking on the object servers altogether since we have currently no iptables rules running and the problem gets exaggerated by adding a new disk/devices in the object ring.
Also we should set net.ipv4.tcp_tw_reuse to 1 to minimize those sessions about to being closed anyway.

Revision history for this message
Bjoern (bjoern-t) wrote :

This also implies that we either turn off lxc-net (which adds iptables rules for non existing containers) or removing lxc altogether.

Revision history for this message
Evan Callicoat (diopter) wrote :

I strongly disagree with the first part of this recommendation. Disabling conntrack in modern Linux is essentially an old hack devised for old Linux to improve performance on extremely busy servers which are dedicated to a single, dumb task from a networking perspective. It's not necessary and definitely not the Right™ way to solve the problem.

I also dislike the idea of removing conntrack/lxc* as it reduces the deployment and expansion flexibility of an object server, which is definitely not aligned with the goals of this project architecture.

You have a few simple options here:

A) If you have lots of TW connections lingering and you're running into the nf_conntrack_max or running out of available sockets, set the tcp_tw_reuse sysctl (as suggested in the second part of this recommendation) in order to allow the kernel to reclaim sockets in TW when it's safe to do so per the protocol. I should note that this is *only* safe to do so when conntrack *is* enabled, because a reused TW port-pair has the potential to still receive late traffic and conntrack is the mechanism which recognizes that traffic as INVALID and drops it appropriately. Not having conntrack on while also reusing TWs can cause actual data crossover from old, closing connections into new sockets, eventually.

B) Up the conntrack limit

C) Set the tcp_tw_recycle sysctl, if you *really* need aggressive connection closing. This can confuse or utterly break the remote ends of connections, so it's not recommended unless these boxes are so incredibly busy they're utterly buckling under the speed of connections/sec, in which case this still really isn't the right solution -- the right solution is to load-balance and scale the architecture appropriately.

My recommendation is A, and B if also necessary if-and-only-if there are enough valid, active connections that even reusing TW sockets hits the conntrack limit. I strongly advise against C in almost any circumstance, and certainly in this one.

Revision history for this message
Bjoern (bjoern-t) wrote :

Thanks for commenting. In terms of TW connection, I'm pretty confident that we don't have to fear out of order packets in a local network. I'm sure that we can intermittently fix this issue with upping the conntrack limit, but once it's hit already in a idle environment I fear we are going to hit it even more once it's used. Speaking of the environment, we have running RPC10 swift environments which do not have conntrack enabled so I find it really disturbing that we suddenly have one environment running with connection tracking. Which seemed to have been enabled accidentally by installing LXC, at least that was the difference compared to other environments since LXC includes lxc-net which needs iptables.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (juno)

Fix proposed to branch: juno
Review: https://review.openstack.org/177163

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (kilo)

Fix proposed to branch: kilo
Review: https://review.openstack.org/177172

Revision history for this message
Andy McCrae (andrew-mccrae) wrote :

I've added PRs following Evan's suggestions - reviews welcome!

Revision history for this message
Darren Birkett (darren-birkett) wrote :

Fix proposed to branch: master
review: https://review.openstack.org/#/c/177160

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (juno)

Reviewed: https://review.openstack.org/177163
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=c9f9a0623fb22fd17e573f4c8e8b139a98aa260c
Submitter: Jenkins
Branch: juno

commit c9f9a0623fb22fd17e573f4c8e8b139a98aa260c
Author: Andy McCrae <email address hidden>
Date: Fri Apr 24 11:45:07 2015 +0100

    Set tcp_tw_reuse for swift storage hosts

    For swift storage hosts we are seeing a lot of connections in TIME WAIT
    status, violating nf_conntrack. Setting tcp_tw_reuse should help
    alleviate this.

    Additionally, in order for tcp_tw_reuse to be set safely we need to
    ensure nf_conntrack is loaded.

    Change-Id: I4392c4022a9a5a884d07eb6fbf27093f0b16f914
    Closes-Bug: #1441363

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (master)

Reviewed: https://review.openstack.org/177160
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=b206e434a350941ed329ba4e84940afc722f3557
Submitter: Jenkins
Branch: master

commit b206e434a350941ed329ba4e84940afc722f3557
Author: Andy McCrae <email address hidden>
Date: Fri Apr 24 11:34:02 2015 +0100

    Set tcp_tw_reuse for swift storage hosts

    For swift storage hosts we are seeing a lot of connections in TIME WAIT
    status, violating nf_conntrack. Setting tcp_tw_reuse should help
    alleviate this.

    Additionally, in order for tcp_tw_reuse to be set safely we need to
    ensure nf_conntrack is loaded.

    Change-Id: I4392c4022a9a5a884d07eb6fbf27093f0b16f914
    Closes-Bug: #1441363

Changed in openstack-ansible:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (kilo)

Reviewed: https://review.openstack.org/177172
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=66cbe687e6fd8a4b9ded9945f9a08395e7932a43
Submitter: Jenkins
Branch: kilo

commit 66cbe687e6fd8a4b9ded9945f9a08395e7932a43
Author: Andy McCrae <email address hidden>
Date: Fri Apr 24 11:34:02 2015 +0100

    Set tcp_tw_reuse for swift storage hosts

    For swift storage hosts we are seeing a lot of connections in TIME WAIT
    status, violating nf_conntrack. Setting tcp_tw_reuse should help
    alleviate this.

    Additionally, in order for tcp_tw_reuse to be set safely we need to
    ensure nf_conntrack is loaded.

    Change-Id: I4392c4022a9a5a884d07eb6fbf27093f0b16f914
    Closes-Bug: #1441363
    (cherry picked from commit b4c09dbd6e4d7d60c4f99469a82d093900ab8aa2)

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.11

This issue was fixed in the openstack/openstack-ansible 11.2.11 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/openstack-ansible 11.2.12

This issue was fixed in the openstack/openstack-ansible 11.2.12 release.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.14

This issue was fixed in the openstack/openstack-ansible 11.2.14 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.