Nagel causes PUT object dip at 64K+ sizes when proxy/object on same node
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
Fix Released
|
Undecided
|
Donagh McCabe |
Bug Description
On a test system Mark Seger noticed a dip in put object performance starting at 64K and going up to 200K+. With help from Rick Jones, TCP traces showed that we seemed to be hitting a pause cased by the interaction of Nagel and other TCP protocols.
Here is an example of the performance at various sizes. The Ops/Sec result is the key item to look at in this output:
(test_
Rank Test Clts Proc OSize Start End MB/Sec Ops Ops/Sec Errs Latency Median LatRange %CPU Comp
0 put 1 1 50k 17:11:38 17:11:38 2.56 1 52.35 0 0.019 0.019 0.02-00.02 0.55 def
0 put 1 1 100k 17:11:38 17:11:38 1.72 1 17.59 0 0.057 0.057 0.06-00.06 0.59 def
0 put 1 1 200k 17:11:38 17:11:39 3.34 1 17.11 0 0.058 0.058 0.06-00.06 1.17 def
0 put 1 1 300k 17:11:39 17:11:39 5.21 1 17.77 0 0.056 0.056 0.06-00.06 0.39 def
0 put 1 1 1m 17:11:39 17:11:39 37.66 1 37.66 0 0.026 0.026 0.03-00.03 0.54 def
(I've used -n1 here, but the results have been verified over 1000s of PUTs)
As you can see, at 50k you get 52 ops/sec and at 1m, you get 37 ops/sec, but the ops/sec for sizes in between is `7 ops/sec.
Nagel is related to MTU size. In this test system, we realised that the proxy-server and one of the object-servers were on the same node. Since the MTU for the lo device is 64K, we see the Nagel effect on requests sent to the local object-server. We stopped the local object-server, so the proxy-server was forced to send all requests to a handoff. Since the handoff was on a different node, the request went out on the network instead of looping back (hence the MTU that applied was 1500 bytes). Re-running the tests we see the dip in performance goes away (i.e., using remote handoffs is better than going loopbackto a local object-server).
We have a fix under test (setting TCP_NODELAY). With the fix in place, the performance looks like:
(test_
Rank Test Clts Proc OSize Start End MB/Sec Ops Ops/Sec Errs Latency Median LatRange %CPU Comp
0 put 1 1 50k 17:11:48 17:11:48 2.78 1 56.96 0 0.017 0.017 0.02-00.02 1.04 def
0 put 1 1 100k 17:11:48 17:11:48 5.13 1 52.58 0 0.019 0.019 0.02-00.02 1.09 def
0 put 1 1 200k 17:11:49 17:11:49 10.45 1 53.52 0 0.019 0.019 0.02-00.02 0.00 def
0 put 1 1 300k 17:11:49 17:11:49 15.03 1 51.32 0 0.019 0.019 0.02-00.02 1.56 def
0 put 1 1 1m 17:11:49 17:11:49 42.20 1 42.20 0 0.024 0.024 0.02-00.02 0.52 def
Our proposed fix will effect all outgoing connections. We will post the fix for review once we've verified it has no negative impact on other paths (container updates, replication requests. etc.)
Changed in swift: | |
assignee: | nobody → Donagh McCabe (donagh-mccabe) |
Changed in swift: | |
milestone: | none → 2.2.2 |
status: | Fix Committed → Fix Released |
so swift.common.wsgi already sets socket.TCP_NODELAY - does the "fix" do something similar in swift.common. bufferedhttp or something?