Nagel causes PUT object dip at 64K+ sizes when proxy/object on same node

Bug #1408622 reported by Donagh McCabe
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Donagh McCabe

Bug Description

On a test system Mark Seger noticed a dip in put object performance starting at 64K and going up to 200K+. With help from Rick Jones, TCP traces showed that we seemed to be hitting a pause cased by the interaction of Nagel and other TCP protocols.

Here is an example of the performance at various sizes. The Ops/Sec result is the key item to look at in this output:

    (test_venv)cetest@ha-kvm1:~$ python /usr/bin/getput -cc -oo -n1 -s50k,100k,200k,300k,1m -tp --proxies 10.23.70.22
    Rank Test Clts Proc OSize Start End MB/Sec Ops Ops/Sec Errs Latency Median LatRange %CPU Comp
    0 put 1 1 50k 17:11:38 17:11:38 2.56 1 52.35 0 0.019 0.019 0.02-00.02 0.55 def
    0 put 1 1 100k 17:11:38 17:11:38 1.72 1 17.59 0 0.057 0.057 0.06-00.06 0.59 def
    0 put 1 1 200k 17:11:38 17:11:39 3.34 1 17.11 0 0.058 0.058 0.06-00.06 1.17 def
    0 put 1 1 300k 17:11:39 17:11:39 5.21 1 17.77 0 0.056 0.056 0.06-00.06 0.39 def
    0 put 1 1 1m 17:11:39 17:11:39 37.66 1 37.66 0 0.026 0.026 0.03-00.03 0.54 def

(I've used -n1 here, but the results have been verified over 1000s of PUTs)

As you can see, at 50k you get 52 ops/sec and at 1m, you get 37 ops/sec, but the ops/sec for sizes in between is `7 ops/sec.

Nagel is related to MTU size. In this test system, we realised that the proxy-server and one of the object-servers were on the same node. Since the MTU for the lo device is 64K, we see the Nagel effect on requests sent to the local object-server. We stopped the local object-server, so the proxy-server was forced to send all requests to a handoff. Since the handoff was on a different node, the request went out on the network instead of looping back (hence the MTU that applied was 1500 bytes). Re-running the tests we see the dip in performance goes away (i.e., using remote handoffs is better than going loopbackto a local object-server).

We have a fix under test (setting TCP_NODELAY). With the fix in place, the performance looks like:

    (test_venv)cetest@ha-kvm1:~$ python /usr/bin/getput -cc -oo -n1 -s50k,100k,200k,300k,1m -tp --proxies 10.23.70.22
    Rank Test Clts Proc OSize Start End MB/Sec Ops Ops/Sec Errs Latency Median LatRange %CPU Comp
    0 put 1 1 50k 17:11:48 17:11:48 2.78 1 56.96 0 0.017 0.017 0.02-00.02 1.04 def
    0 put 1 1 100k 17:11:48 17:11:48 5.13 1 52.58 0 0.019 0.019 0.02-00.02 1.09 def
    0 put 1 1 200k 17:11:49 17:11:49 10.45 1 53.52 0 0.019 0.019 0.02-00.02 0.00 def
    0 put 1 1 300k 17:11:49 17:11:49 15.03 1 51.32 0 0.019 0.019 0.02-00.02 1.56 def
    0 put 1 1 1m 17:11:49 17:11:49 42.20 1 42.20 0 0.024 0.024 0.02-00.02 0.52 def

Our proposed fix will effect all outgoing connections. We will post the fix for review once we've verified it has no negative impact on other paths (container updates, replication requests. etc.)

Changed in swift:
assignee: nobody → Donagh McCabe (donagh-mccabe)
Revision history for this message
clayg (clay-gerrard) wrote :

so swift.common.wsgi already sets socket.TCP_NODELAY - does the "fix" do something similar in swift.common.bufferedhttp or something?

Revision history for this message
Donagh McCabe (donagh-mccabe) wrote :

> does the "fix" do something similar in swift.common.bufferedhttp or something?
Yep. Just doing due diligence to check there is no performance regression due to setting it on all outgoing connecitons.

Revision history for this message
Donagh McCabe (donagh-mccabe) wrote :

I can't find any evidence in testing that setting TCP_NODELAY on all outgoing connections causes any problems. httplib already does the best it can to avoid a write, write, read effect by buffering the request and headers (and body if supplied) into one write. The only time it does a write, write... is when the body is written in an iterator loop. Setting TCP_NODELAY fixes the delay you see with writes in the iterator (i.e., on a PUT). For most other requests (deletes, container update, etc.) httplib is already usually only doing one write so setting TCP_NODELAY is a noop.

Will submit a fix for review.

The Nagel effect is seen when write sizes and the MTU match (or missmatch, depending on your point of view). However, timing can also play a part. To see if you observe this effect, you need a system where the proxy-server and object-server are on the same node. You are affected by this problem if you see PUTs of larger objects being faster than smaller objects. In my original posting, 1m did 37 puts/second, byt 100k did 17 puts/sec. If you have such a system, you will be able to confirm that settingf TCP_NODELAY fixes the issue and you get a smooth pattern as the object size increases.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.openstack.org/147220

Changed in swift:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/147220
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=b434be452ead0625728afedfe01bac1c30629d30
Submitter: Jenkins
Branch: master

commit b434be452ead0625728afedfe01bac1c30629d30
Author: Donagh McCabe <email address hidden>
Date: Thu Jan 8 14:52:32 2015 +0000

    Use TCP_NODELAY on outgoing connections

    On a loopback device (e.g., when proxy-server and object-server are on
    same node), PUTs in the range 64-200K may experience a delay due to the
    effect of Nagel interacting with the loopback MTU of 64K.

    This effect has been directly seen by Mark Seger and Rick Jones on a
    proxy-server to object-server PUT. However, you could expect to see a
    similar effect on replication via ssync if the object being replicated
    is on a different drive on the same node.

    A prior change [1] related to Nagel set TCP_NODELAY on responses. This change
    sets it on all outgoing connections.

    [1] I11f86df1f56fba1c6ab6084dc1f580c395f072dc

    Change-Id: Ife8885a42b289a5eb4ac7e4698f8889858bc8b7e
    Closes-bug: 1408622

Changed in swift:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/ec)

Fix proposed to branch: feature/ec
Review: https://review.openstack.org/148983

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/ec)
Download full text (18.2 KiB)

Reviewed: https://review.openstack.org/148983
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=ef59cde83f176df9e36064f70142c9d8e81318fe
Submitter: Jenkins
Branch: feature/ec

commit 6f21504ccc9046e3f0b4db88f78297a00030dd3d
Author: Kota Tsuyuzaki <email address hidden>
Date: Tue Jan 13 05:34:37 2015 -0800

    Fix missing content length of Response

    This patch fixes swob.Response to set missing content
    length correctly.

    When a child class of swob.Response is initialized with
    both "body" and "headers" arguments which includes content
    length, swob.Response might loose the acutual content length
    generated from the body because "headers" will overwrite the
    content length property after the body assignment.

    It'll cause the difference between headers's content length
    and acutual body length. This would affect mainly 3rd party
    middleware(s) to make an original response as follows:

    req = swob.Request.blank('/')
    req.method = 'HEAD'
    resp = req.get_response(app)
    return HTTPOk(body='Ok', headers=resp.headers)

    This patch changes the order of headers updating and then
    fixes init() to set correct content length.

    Change-Id: Icd8b7cbfe6bbe2c7965175969af299a5eb7a74ef

commit b434be452ead0625728afedfe01bac1c30629d30
Author: Donagh McCabe <email address hidden>
Date: Thu Jan 8 14:52:32 2015 +0000

    Use TCP_NODELAY on outgoing connections

    On a loopback device (e.g., when proxy-server and object-server are on
    same node), PUTs in the range 64-200K may experience a delay due to the
    effect of Nagel interacting with the loopback MTU of 64K.

    This effect has been directly seen by Mark Seger and Rick Jones on a
    proxy-server to object-server PUT. However, you could expect to see a
    similar effect on replication via ssync if the object being replicated
    is on a different drive on the same node.

    A prior change [1] related to Nagel set TCP_NODELAY on responses. This change
    sets it on all outgoing connections.

    [1] I11f86df1f56fba1c6ab6084dc1f580c395f072dc

    Change-Id: Ife8885a42b289a5eb4ac7e4698f8889858bc8b7e
    Closes-bug: 1408622

commit b5586427e503ee22c0b20b109cad83e166ed3fd8
Author: Pete Zaitcev <email address hidden>
Date: Sat Jan 10 17:14:46 2015 -0700

    Drop redundant index output

    The output of format_device() now includes index as the first "dX"
    element, for example d1r1z2-127.0.0.1:6200R127.0.0.1:6200/db_"".

    Change-Id: Ib5f8e3a692fddbe8b1f4994787b2883130e9536f

commit c65bc49e099928801b80dce399d6098f7e10e137
Author: Pete Zaitcev <email address hidden>
Date: Sat Jan 10 08:20:25 2015 -0700

    Mark the --region as mandatory

    We used to permit to omit region in the old parameter syntax, although
    we now throw a warning if it's missing. In the new parameter syntax,
    --region is mandatory. It's enforced by build_dev_from_opts in
    swift/common/ring/utils.py.

    On the other hand, --replication-ip, --replication-port, and --meta
    are not obligatory.

    Change-Id: Ia70228f2c99595501271765286431f68e82e800b
...

Thierry Carrez (ttx)
Changed in swift:
milestone: none → 2.2.2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.