TCP_DEFER_ACCEPT causes random HTTP connection failures in load-balanced web-server farms

Bug #134274 reported by TJ
Affects Status Importance Assigned to Milestone
Apache2 Web Server
apache2 (Ubuntu)
Fix Released

Bug Description

Binary package hint: apache2

This applies to Apache 2.1.5 +

In a web server-farm scenario that is fronted by hardware load-balancers, in this case Juniper Redline aka DX, where the load-balancers are configured to use TCP multiplexing (holding open and re-using HTTP connections to the web servers) there exists the potential for random, unexplained and untraceable connection failures.

From the end-user web browser client perspective, all they see is a blank page returned. This happens so randomly any reports from end users would be dismissed as network glitches.

I've spent the last two weeks working with a large IT e-commerce retailer. Their system administrator initially came to me with the belief that something in the Linux kernel network stack was faulty. He had already done extensive diagnostic work with the Juniper support engineers and neither had been able to pin-point the cause of the failure.

What they knew was the persistent connection between the DXs and the web-servers would occasionally, and seemingly randomly, be RESET by the server. Some web servers in some clusters were affected; others weren't.

When I examined the tcpdump capture taken on a web server it quickly became evident that Linux was ignoring ACKs from the DX during the initial handshake, was retrying the SYN ACK the default 5 times, and then closing the half-open connection.

After a lot of work with custom-written tools that detected packets at the PF_PACKET level (libpcap) and checked they were seen by the netfilters/iptables layer, we decided to hack a custom kernel. I added printk() statements into net/ipv4/tcp_minisocks.c::tcp_check_req() so each cause of a dropped packet was logged, and moved the netfilters NF_HOOK() used for the 'mangle table INPUT chain' from its usual location in net/ipv4/ip_input.c:ip_local_deliver() to net/ipv4/tcp_input::tcp_rcv_state_process() after "tcp_set_state(sk, TCP_ESTABLISHED);" in order that we could detect every handshake that failed.

As a result we discovered handshakes were having their ACK from the client (in this case the Juniper DX) discarded because the listening socket was operating with TCP_DEFER_ACCEPT flag (SO_ACCEPTFILTER on BSD).

The server's SYN_RECEIVED timer would time-out, and the server would resend the SYN ACK. The DX would reply with a duplicate ACK, which would again be discarded.

This would repeat 5 times (the default retries for SYN ACK). Each time-out doubled in time: 3, 6, 12, 24, 48, 96 seconds respectively - ~190 seconds in total. If a request arrived from the DX *after* this the DX received a RST from the server since the socket had been closed due to the handshake failure. This causes the end-user client to see a 'white page' (empty response).

If a request arrived from the DX *before* the retries and time-out expired it would cause the connection to be ESTABLISHED and the request would be handled.

The reason for the failures is the Juniper DX maintains a group (by default 6) of persistent connections to each target host in a cluster of servers. It creates these persistent connections *before* it has HTTP requests for the target server. If the server is using Deferred Accept (TCP_DEFER_ACCEPT) on listening sockets the connection will not be promoted to ESTABLISHED until data is received.

It turns out that Apache introduced TCP_DEFER_ACCEPT as the *default* for its socket options in version 2.1.5. There needs to be no specific

 AcceptFilter http data

rule in the Apache configuration files to enable it. In fact, it needs

 AcceptFilter http none

in order to disable TCP_DEFER_ACCEPT on its sockets.

Because the Juniper DX OS up to at least version 5.2.6 doesn't correctly implement the HTTP protocol when using persistent connections, the interaction between Apache 2.1.5+ and the DX persistent connections brings about this issue when *traffic is light* - it won't happen if the work load is medium or heavy.

The root cause of the failure, but exacerbated by the change in Apache 2.1.5+ to using TCP_DEFER_ACCEPT, is that the Juniper DX OS tries to open a connection to the HTTP server *but doesn't send a request*.

Unlike other protocols like telnet, HTTP expects the connection to be accompanied by a request, so the TCP packet contains data. RFC2616 (HTTP 1.1) section 1.4 states:

"...a connection may be used for one or more request/response exchanges..."

The Juniper DX however creates a connection and in low-traffic situations doesn't send "one or more request[s]..." causing the Linux kernel network stack to time-out the socket.

The work-around is to disable TCP_DEFER_ACCEPT when deploying Apache 2.1.5+ behind load-balancing systems such as the Juniper Redline / DX by adding to the Apache configuration:

 AcceptFilter http none

Revision history for this message
TJ (tj) wrote :

Fixed via work-around described

Changed in apache2:
assignee: nobody → intuitive-nipple
status: New → Confirmed
Mathias Gug (mathiaz)
Changed in apache2:
importance: Undecided → Low
Revision history for this message
TJ (tj) wrote :

Pending feed-back from the Linux netdev mailing list as to whether this can be considered a bug in the kernel's implementation of TCP_DEFER_ACCEPT on this basis:

An RFC standard TCP handshake requires three stages:

client SYN > server LISTENING
client < SYN ACK server SYN_RECEIVED
client ACK > server ESTABLISHED

client PSH ACK + data > server

TCP_DEFER_ACCEPT is designed to increase performance by reducing the number of TCP packets exchanged before the client can pass data.

client SYN > server LISTENING
client < SYN ACK server SYN_RECEIVED

client PSH ACK + data > server ESTABLISHED

At present with TCP_DEFER_ACCEPT the kernel treats the RFC handshake as invalid; dropping the ACK from the client without replying.

There is a case for arguing the kernel is operating in an enhanced handshaking mode when TCP_DEFER_ACCEPT is enabled, not an alternative mode, and therefore should accept *both* client responses (ACK, then PSH ACK + data, or PSH ACK + data). I've been unable to find a specification or RFC for implementing TCP_DEFER_ACCEPT aka BSD's SO_ACCEPTFILTER.

It seems incorrect to penalise a client that is trying to complete the handshake according to the RFC specification, especially as the client has no way of knowing ahead of time whether or not the server is operating deferred accept.

I'll update once the kernel net-devs have given their views.

Revision history for this message
Mathias Gug (mathiaz) wrote :

Any update on this bug TJ ?

Changed in apache2:
status: Confirmed → Triaged
Revision history for this message
TJ (tj) wrote :

I asked the question on the 'netdev' mailing list and got some inconclusive reactions.

The general consensus was that:

a) TCP_DEFER_ACCEPT isn't specified in any RFC and breaks regular handshake negotiation of RFC 793.
b) The load-balancer shouldn't open an HTTP connection and not use it immediately, but no one could/did provide a definitive rule on that.

The upshot is, any Apache 2.1.5+ installation behind pipe-lining load-balancers could suffer the same fate. It is something we should add to the documentation for version 2.1.5+, and it should be up-front in the installation notes for the server product as well as making the server support teams aware of it.

In one way shipping Apache with TCP_DEFER_ACCEPT enabled is in breach of the RFCs and as such it could be argued is a 'bad thing'. On the other had it appears it affects only a small number of installations (judging by how little information there is about it).

There are a few other related issues around TCP_KEEP_ALIVE and broken time-outs in the 'netdev' and 'kernel' mailing lists, and at apache:

Apache "TCP_DEFER_ACCEPT timeout set way too low"
kernel "TCP_DEFER_ACCEPT issues"
netdev "TCP_DEFER_ACCEPT brokenness?"

TJ (tj)
Changed in apache2:
assignee: intuitivenipple → nobody
Revision history for this message
Pierre GAUTHIER (pierre-gauthier) wrote :

When TCP_DEFER_ACCEPT is enabled on the listening socket:

- HTTP-Keep-Alive requests get a boost;

- non-HTTP-Keep-Alive requests are delayed after a while.

The problem happens with high concurrencies only.

This has been tested with various HTTP servers and is a consistant behavior on Linux Ubuntu 8.1+.

The higher the value for TCP_DEFER_ACCEPT, the faster HTTP servers get into trouble (when they receive ab connections without -k):

ab -t 100 -c900 http://a.b.c.d/100.html (a 100-byte file)

Anyone having a clue why this is happening?

(HTTP Keep-Alives being a persistant TCP connection, it is strange to see TCP_DEFER_ACCEPT giving them a boost)
(inversely, non-HTTP Keep-Alives being new TCP connections, it is strange to see TCP_DEFER_ACCEPT delaying them)

Thanks for any pointer.

Changed in apache2 (Ubuntu):
status: Triaged → Invalid
Revision history for this message
TJ (tj) wrote :

Please do not change the status of a confirmed/triaged bug that describes a known issue that is as yet unresolved.

Changed in apache2 (Ubuntu):
status: Invalid → Triaged
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

This is a pretty old bug, and doesn't seem to have been linked to any upstream issue, even though it is likely just that. Has anyone verified that the behavior persists on a more recent release of Ubuntu?

Revision history for this message
TJ (tj) wrote :

Unfortunately upstream cannot agree on what precisely the standards require in relation to TCP_DEFER_ACCEPT, thus this issue slips through the cracks and doesn't affect enough installations to create any real drive.

As can be seem from my investigations it is an obscure bug which is very hard to catch and probably goes unseen in most circumstances.

My preferred solution would be a prominent addition to the Debian/Ubuntu READMEs for apache with a reference to this bug with advice to use

AcceptFilter http none

when operating in a load-balancing configuration.

Revision history for this message
harm (harm-verhagen-w) wrote :

This exact problem causes havoc if you have many slow clients on slow networks (gprs).

We actually had this problem, and to confirm #8. its *really* "hard to catch". We actually had a >80% unsuccessfull connection attemps.

I cannot believe Apache ships with TCP_DEFER_ACCEPT on 1 sec, or even enabled at all. I don't see the possible gain.

ref: varnish had it enabled for a while but they disabled it too [1]

Can ubuntu change the default config to include:

AcceptFilter http none
AcceptFilter https none


Revision history for this message
harm (harm-verhagen-w) wrote :
arlife (arlife)
Changed in apache2 (Ubuntu):
status: Triaged → New
assignee: nobody → arlife (arlife)
Changed in apache2:
assignee: nobody → farhan saleh robleh (farhn)
Changed in apache2 (Ubuntu):
assignee: arlife (arlife) → farhan saleh robleh (farhn)
status: New → Confirmed
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Upstream marked this as fixed in 2.2.28, which is from the Ubuntu precise timeframe. Trusty already ships 2.4.7, so marking this fix released.

Changed in apache2 (Ubuntu):
status: Confirmed → Fix Released
Colin Watson (cjwatson)
Changed in apache2:
assignee: farhan saleh robleh (farhn) → nobody
Changed in apache2 (Ubuntu):
assignee: farhan saleh robleh (farhn) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.