[OSSA 2016-004] Download DLO objects leak connections when client kill connection (CVE-2016-0737)

Bug #1466549 reported by Romain LE DISEZ on 2015-06-18
264
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Undecided
Unassigned
Nominated for Juno by Tristan Cacqueray
Nominated for Kilo by Tristan Cacqueray
OpenStack Security Advisory
Undecided
Unassigned

Bug Description

When a client is downloading a DLO object, if it interrupts its connection, the proxy does not close the connection to the object server

From my test, this bug impacts versions 1.10, 2.2.2 and master.

I flag this bug security vulnerability because it makes it easy to do a DoS on a proxy-server. It's actually what's happening to us because one of our user do VoD. It uses DLO a lot and close connections when his customers stop watching video.

How to reproduce on a fresh SAIO, with processes just started:
1. Upload an object in DLO
    $ dd if=/dev/zero of=2x1G bs=1M count=2048
    $ swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing post test
    $ swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing upload test -S 1073741824 2x1G

2. In an other terminal, watch connections between proxy-server and object-server:
   $ watch -n 1 "netstat -tapn | grep -E ':60[1-4]0' | grep ESTA"
(For now, you should see no connections)

3. Start to download the object, but stop it before the end (eg: CTRL+C)
    $ swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing download test 2x1G --no-download
(Now, you should see one connection)

4. Repeat step 3 and observe the connections stacking

CVE References

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

Changed in ossa:
status: New → Incomplete
Romain LE DISEZ (rledisez) wrote :

Based on https://www.python.org/dev/peps/pep-0333/#specification-details
"If the iterable returned by the application has a close() method, the server or gateway must call that method upon completion of the current request, whether the request was completed normally, or terminated early due to an error. (This is to support resource release by the application. This protocol is intended to complement PEP 325 's generator support, and other common iterables with close() methods."

here is a fix proposal : https://review.openstack.org/#/c/193192/

It passes all tests, but I'm not very familiar with wsgi. So i don't know all the implications of adding a close() method.

Samuel Merritt (torgomatic) wrote :

Eventually the object server will time out the connection, right? I mean, immediately after killing the client -> proxy connection, you'll see what appears to be a leaked connection from proxy -> obj, but if you wait a few minutes, the object server will time out and close the connection from its end.

Romain LE DISEZ (rledisez) wrote :

It might need more testing, but i didn't observe any connection timeout-ing.

On production, we see that leaked connections continue increasing for many hours, until the proxy cannot treat connections anymore. On my SAIO, after more than 20 minutes, connections were still not closed.

Samuel Merritt (torgomatic) wrote :

This bug report should probably be marked as public since there's a patch in Gerrit referencing it.

For reference, when filing a security bug, attach patches to the LP bug, not in Gerrit. This is probably the only case where attaching patches to LP bugs is better than filing in Gerrit.

Opening this bug now, does it affect stable releases ?

information type: Private Security → Public Security
Changed in swift:
status: New → Fix Committed
Jeremy Stanley (fungi) wrote :

For the record, the patch which was eventually merged to address this (I think) seems to be https://review.openstack.org/193303 . Does that mean that the issue described here is confirmed to be reproducible under most deployments?

Change abandoned by paul luse (<email address hidden>) on branch: master
Review: https://review.openstack.org/201269
Reason: going to redo this w/o the dependencies

Change abandoned by Michael Barton (<email address hidden>) on branch: feature/hummingbird
Review: https://review.openstack.org/202227
Reason: Apparently I did this wrong.

Download full text (72.8 KiB)

Reviewed: https://review.openstack.org/202230
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=f7cb1777e1b514b3345b9e516ed8f89ad1a4ae87
Submitter: Jenkins
Branch: feature/hummingbird

commit 51f806d3e3d3a1fcbc80d2f7d7ddbe5cc4c024c9
Author: John Dickinson <email address hidden>
Date: Tue Jul 14 20:49:08 2015 -0700

    remove Python 2.6 from the classifier

    Change-Id: I67233e9c7b69826242546bd6bd98c24b81070579

commit 278adf5c20101a191979ce1e4d6277e5f209149e
Author: Hisashi Osanai <email address hidden>
Date: Tue Jul 14 15:33:45 2015 +0900

    Make logic of unit tests responsive to the method names

    The two methods, test_authorize_succeeds_for_tenant_name_in_roles and
    test_authorize_succeeds_for_tenant_id_in_roles, have names that don't
    match what they are testing. tenant_name and tenant_id need to be
    switched.

    Change-Id: I7cb0a7d2b2111127fd5d6b55f2da6a3eadf2235d

commit 1cc3eff958fdd4fb07c2b74c52df7829d3125466
Author: Victor Stinner <email address hidden>
Date: Fri Jul 10 13:04:44 2015 +0200

    Fixes for mock 1.1

    The new release of mock 1.1 is more strict. It helped to find bugs in
    tests.

    Closes-Bug: #1473369
    Change-Id: Id179513c6010d827cbcbdda7692a920e29213bcb

commit ff192cfe5705324497a389aa2f22227d75dc0f8e
Author: janonymous <email address hidden>
Date: Wed Jul 8 18:38:22 2015 +0530

    Replace reduce and unichr , these are no longer available in py3

    * Replace reduce() with six.moves.reduce()
    * Replace unichr with six.unichr

    Change-Id: I2038e47e0a6522dd992fd2a4aeff981cf7750fe0

commit 4beceab4f4be99f14025815cf7ed4510ea77f460
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Jul 9 06:14:56 2015 +0000

    Imported Translations from Transifex

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I9ff1dde06be45fc7d6c441a1e1c07221f839a9a1

commit 56ee39a7e13417203c5e1816d7a3184a07f85826
Author: Matthew Oliver <email address hidden>
Date: Thu Jul 9 15:19:32 2015 +1000

    Ring builder code clean up follow up patch

    This is a simple change that cleans up a NIT from Sam's 'stop moving
    partitions unnecessarily when overload is on' patch.

    Change-Id: I9d9f1cc23e2bb625d8e158f4d3f64e10973176a1

commit 6cafd0a4c0bb8f311fc59df580b42e801214effd
Author: Oshrit Feder <email address hidden>
Date: Wed Jul 8 15:18:22 2015 +0300

    Fix Container Sync example

    Container-sync realm uses cluster_ as a prefix to specify clusters'
    names. At use, the prefix should not be included. Fixing the examples
    and sample conf to make it clearer that only the name of the cluster
    should be passed.

    Change-Id: I2e521d86faffb59e1b45d3f039987ee023c5e939

commit 125238612f58481316db68d7087252bb7729f447
Author: Janie Richling <email address hidden>
Date: Sat Jul 4 17:08:32 2015 -0500

    Add CORS unit tests to base

    In earlier versions of swift when a request was made with an
    existing origin, but without any CORS settings in the container,
    it was possible to get an u...

tags: added: in-feature-hummingbird
Download full text (80.5 KiB)

Reviewed: https://review.openstack.org/205579
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=8ab46b64365b8eab80680f2562f81e8adb3032a3
Submitter: Jenkins
Branch: feature/crypto

commit 89f705e8aab144092d40a13fc4ef19ffef5f3eba
Author: OpenStack Proposal Bot <email address hidden>
Date: Thu Jul 23 06:11:27 2015 +0000

    Imported Translations from Transifex

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I94cf347564cb33977f33b1e64259bcb39a8cf809

commit a65e9db8752793ec37b594dc9eca5066171825db
Author: Christian Schwede <email address hidden>
Date: Wed Jul 22 10:43:17 2015 +0000

    Removing commented out code in test/unit/account/test_backend.py

    Noticed this while reviewing another change. Looks like the test itself already
    ensures correct functionality of the reclaim() method in AccountBroker without
    the commented code, thus removing this stale code.

    Change-Id: I6a26a7591adef9fd794ca68a4e9c493d1127f93c

commit 99d052772a9585e0befdfd292fd03aefde77180a
Author: Kota Tsuyuzaki <email address hidden>
Date: Mon Jul 13 01:12:43 2015 -0700

    Fix 499 client disconnected on COPY EC object

    Currently, a COPY request for an EC object might go to fail as 499 Client
    disconnected because of the difference between destination request content
    length and actual transferred bytes.

    That is because the conditional response status and content length for
    an EC object range GET is handled at calling the response instance on
    proxy server. Therefore the calling response instance (resp()) will change
    the conditional status from 200 (HTTP_OK) to 206 (PartialContent) and will
    change the content length for the range GET.

    In EC case, sometimes Swift needs whole stored contents to decode a segment.
    It will make 200 HTTP OK response from object-server and proxy-server
    will unfortunately set whole content length to the destination content
    length and it makes the bug 1467677.

    This patch introduces a new method "fix_conditional_response" for
    swift.common.swob.Response that calling _response_iter() and cached the
    iter in the Response instance. By calling it, Swift can set correct condtional
    response any time after setting whole content_length to the response
    instance like EC case.

    Change-Id: If85826243f955d2f03c6ad395215c73daab509b1
    Closes-Bug: #1467677

commit 62ed4f81ef80440550633eaaaa962a4f9383c2d3
Author: Timur Alperovich <email address hidden>
Date: Tue Jul 14 16:56:44 2015 -0700

    Add two functional tests for delimiter.

    The first test verifies that a delimiter will trim entries beyond the
    first matching instance of delimiter (after the given matching prefix,
    if any) and squash duplicates. So, when setting the delimiter
    to "-", given blobs "test", "test-foo" and "test-bar-baz", we expect
    only "test" (no matching delim) and "test-" (trim all characters after
    the first "-", and squash duplicates).

    The second test verifies that when a prefix is provid...

tags: added: in-feature-crypto
clayg (clay-gerrard) wrote :

yeah, https://review.openstack.org/#/c/193303/ was the fix.

But it's really about the same as https://bugs.launchpad.net/swift/+bug/667956 (i.e. slow resource exhaustion depending on your clients access patterns) and that was open for ages. I'm glad we fixed both of them...

So yeah, people that need to support a bunch of VOD clients that disconnect reading *LO's should upgrade to the latest Swift or they will probably encounter this bug.

Changed in ossa:
status: Incomplete → Confirmed

Thanks clayg, so this issue can be abused, the fix should be backported and an OSSA issued.

Here is the proposed impact description:

Title: Swift proxy-server DoS through Large Object
Reporter: Romain LE DISEZ (OVH)
Products: Swift
Affects: versions through 2.3.0

Description:
Romain LE DISEZ from OVH reported a vulnerability in Swift Large Object. By repeatedly requesting and interrupting connections to a Large Object (Dynamic or Static) URL, a remote attacker may exhausts Swift proxy-server resources, potentially resulting in a denial of service. All Swift setup are affected.

Thierry Carrez (ttx) on 2015-09-01
Changed in swift:
milestone: none → 2.4.0
status: Fix Committed → Fix Released

It seems the bug is not really fixed but it is "moved". I know have sockets leak when a client stops a DLO download. Each interrupted DLO download leak one TCP socket. It applies on Linux, don't know if others OS are impacted.

How to reproduce, on a fresh SAIO on master, with processes just started:
1. Upload an object in DLO
    $ dd if=/dev/zero of=2x1G bs=1M count=2048
    $ swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing post test
    $ swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing upload test -S 1073741824 2x1G

2. In an other terminal, watch socket stats:
   $ while [ 1 ]; do grep ^TCP /proc/net/sockstat; sleep 1; done
(For now, you should see that the number of allocated sockets is stable, for me: alloc 45)

3. Start to download the object, but stop it before the end (eg: CTRL+C)
    $ swift -A http://127.0.0.1:8080/auth/v1.0 -U test:tester -K testing download test 2x1G --no-download
(Now, you should see one more allocated socket)

4. Repeat step 3 and observe the socket stacking

So, no more connections leak, but socket leak. Instead of few hours, it takes few days to exhaust server resources.

Change abandoned by John Dickinson (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/217750
Reason: Abanoning based on the lack of activity since the last negative review. If you want to continue working on this, please reopen this patch.

Romain LE DISEZ (rledisez) wrote :

I don't think it is the same. Bug #1361360 describe the slowloris attack, where TCP connections are kept open for a long time if the client does not close the connection.

The behavior i'm observing is that when client close the connection, the TCP connection is closed, but the socket is not released. Also, it does not apply on standard downloads, but only on DLO downloads.

To be sure, i tried to set keepalive=False (and socket_timeout=10): it didn't fix the problem.

Grant Murphy (gmurphy) wrote :

+1 impact description in comment #18

Reviewed: https://review.openstack.org/217750
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=036c2f348d24c01c7a4deba3e44889c45270b46d
Submitter: Jenkins
Branch: stable/kilo

commit 036c2f348d24c01c7a4deba3e44889c45270b46d
Author: Samuel Merritt <email address hidden>
Date: Thu Jun 18 12:58:03 2015 -0700

    Get better at closing WSGI iterables.

    PEP 333 (WSGI) says: "If the iterable returned by the application has
    a close() method, the server or gateway must call that method upon
    completion of the current request[.]"

    There's a bunch of places where we weren't doing that; some of them
    matter more than others. Calling .close() can prevent a connection
    leak in some cases. In others, it just provides a certain pedantic
    smugness. Either way, we should do what WSGI requires.

    Noteworthy goofs include:

      * If a client is downloading a large object and disconnects halfway
        through, a proxy -> obj connection may be leaked. In this case,
        the WSGI iterable is a SegmentedIterable, which lacked a close()
        method. Thus, when the WSGI server noticed the client disconnect,
        it had no way of telling the SegmentedIterable about it, and so
        the underlying iterable for the segment's data didn't get
        closed.

        Here, it seems likely (though unproven) that the object server
        would time out and kill the connection, or that a
        ChunkWriteTimeout would fire down in the proxy server, so the
        leaked connection would eventually go away. However, a flurry of
        client disconnects could leave a big pile of useless connections.

      * If a conditional request receives a 304 or 412, the underlying
        app_iter is not closed. This mostly affects conditional requests
        for large objects.

    The leaked connections were noticed by this patch's co-author, who
    made the changes to SegmentedIterable. Those changes helped, but did
    not completely fix, the issue. The rest of the patch is an attempt to
    plug the rest of the holes.

    Co-Authored-By: Romain LE DISEZ <email address hidden>

    Closes-Bug: #1466549

    Change-Id: I168e147aae7c1728e7e3fdabb7fba6f2d747d937
    (cherry picked from commit 12d8a53fffea6e4bed8ba3d502ce625f5c6710b9
    with fixed import conflicts)

tags: added: in-stable-kilo
summary: Download DLO objects leak connections when client kill connection
+ (CVE-2016-0737)
summary: - Download DLO objects leak connections when client kill connection
- (CVE-2016-0737)
+ [OSSA 2016-004] Download DLO objects leak connections when client kill
+ connection (CVE-2016-0737)
Changed in ossa:
status: Confirmed → Fix Released

Reviewed: https://review.openstack.org/270241
Committed: https://git.openstack.org/cgit/openstack/ossa/commit/?id=606a18e718aed329a9d42d298e3119f0f1974e3d
Submitter: Jenkins
Branch: master

commit 606a18e718aed329a9d42d298e3119f0f1974e3d
Author: Tristan Cacqueray <email address hidden>
Date: Wed Jan 20 10:19:30 2016 -0500

    Adds OSSA-2016-004 (CVE-2016-0737, CVE-2016-0738)

    Related-Bug: #1466549
    Related-Bug: #1493303
    Change-Id: Id7b40ab5101ccbd889c4ffc6bd9629bb5f2b8d7f

To post a comment you must log in.
This report contains Public Security information  Edit
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.