[OSSA 2016-004] Swift proxy memory leak on unfinished read (CVE-2016-0738)

Bug #1493303 reported by Örjan Persson on 2015-09-08
288
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Critical
Unassigned
OpenStack Security Advisory
Undecided
Unassigned
Ubuntu Cloud Archive
High
Unassigned
Icehouse
Undecided
Unassigned
Kilo
Undecided
Unassigned
Liberty
Undecided
Unassigned
Mitaka
Undecided
Unassigned
swift (Ubuntu)
High
Unassigned
Trusty
High
Unassigned
Wily
Undecided
Unassigned
Xenial
Undecided
Unassigned
Yakkety
High
Unassigned

Bug Description

It looks like the Swift proxy will leak memory if the connection is closed and the full response is not read. This opens for a potential DoS attacks.

Reproduce:

$ swift -A http://localhost:8888/auth/v1.0 -U .. -K .. upload --use-slo --segment-size 1048576 <container> <big-file>
$ curl -H'X-Auth-Token: AUTH_...' "http://localhost:8888/v1/AUTH_../<container>/<big-file>" -m 0.001 > /dev/null

Repeat the curl command a couple of times and you will have more information in netstat and sockstat. The important part is the -m which sets the max time curl spends at downloading. After that point, it'll close the connection.

$ sudo netstat -ant -p | grep :6000
$ cat /proc/net/sockstat

tcp 0 0 127.0.0.1:6000 0.0.0.0:* LISTEN 1358/python
tcp 0 43221 127.0.0.1:6000 127.0.0.1:48350 FIN_WAIT1 -
tcp 0 43221 127.0.0.1:6000 127.0.0.1:48882 FIN_WAIT1 -
tcp 939820 0 127.0.0.1:48350 127.0.0.1:6000 ESTABLISHED 17897/python
tcp 939820 0 127.0.0.1:48882 127.0.0.1:6000 ESTABLISHED 17890/python
tcp 983041 0 127.0.0.1:48191 127.0.0.1:6000 CLOSE_WAIT 17897/python
tcp 983041 0 127.0.0.1:48948 127.0.0.1:6000 CLOSE_WAIT 17892/python

Restarting the proxy frees up the lingering memory.

This problem did not exist in 2.2.0.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: swift 2.2.2-0ubuntu1~cloud0 [origin: Canonical]
ProcVersionSignature: Ubuntu 3.16.0-48.64~14.04.1-generic 3.16.7-ckt15
Uname: Linux 3.16.0-48-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.12
Architecture: amd64
CrashDB:
 {
                "impl": "launchpad",
                "project": "cloud-archive",
                "bug_pattern_url": "http://people.canonical.com/~ubuntu-archive/bugpatterns/bugpatterns.xml",
             }
Date: Tue Sep 8 09:55:05 2015
InstallationDate: Installed on 2015-06-22 (77 days ago)
InstallationMedia: Ubuntu-Server 14.04.2 LTS "Trusty Tahr" - Release amd64 (20150218.1)
PackageArchitecture: all
SourcePackage: swift
UpgradeStatus: No upgrade log present (probably fresh install)

CVE References

Örjan Persson (o-42mm) wrote :
James Page (james-page) on 2015-09-17
Changed in cloud-archive:
importance: Undecided → Critical
importance: Critical → High
Changed in swift (Ubuntu):
importance: Undecided → High
Jeremy Stanley (fungi) wrote :

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

Changed in ossa:
status: New → Incomplete
description: updated
Jeremy Stanley (fungi) wrote :

From the description it sounds like this bug impacts the stable/kilo and master (liberty) branches, but not the stable/juno branch.

@swift-coresec, is this a duplicate of bug 1466549 ?

@swift-coresec, can you triage this bug report please ?

Samuel Merritt (torgomatic) wrote :

On commit fdc8828e8527b6fa4997072fb4795e4060c68cc0 (tip of master as of the time of this comment), I am unable to reproduce the behavior. I experimented with a variety of curl timeouts (the -m argument), including the proposed 0.001, but still could not reproduce this.

On a freshly-restarted VM, I did see a small jump in the number of non-TIME_WAIT connections after the first request, but that jump did not occur on the second and subsequent requests, so it appears to be something warming up.

I also checked stable/liberty (47eb6a37) and was unable to reproduce the behavior there either.

I *can* reproduce this on stable/kilo (da428c4f).

I suspect that commit 12d8a53 fixed this, though I have no proof of that.

Christian Schwede (cschwede) wrote :

I couldn't reproduce this on/after commit 12d8a53, but I could reproduce it on the commits before that.

I had to use a larger timeout to reproduce this (0.01 seconds). I used the following command:

 for i in `seq 1 1000`; do curl -H 'X-Auth-Token: AUTH_...' http://127.0.0.1:8080/v1/AUTH_test/big/bigfile -m 0.01 > /dev/null ; done

From my point of view this affects all releases before commit 12d8a53, and I would propose to backport 12d8a53 to stable/juno and stable/kilo.

Christian Schwede (cschwede) wrote :

Just to add more light on this: most of the connections on pre-12d8a53 are still not closed after 50 minutes. That definetely does not happen on a branch on or after commit 12d8a53.

Download full text (6.2 KiB)

Sam and Romain was the author of patch 12d8a53, the patch makes sure
connections are closed. Here is the commit message of the patch:

commit 12d8a53fffea6e4bed8ba3d502ce625f5c6710b9
Author: Samuel Merritt <email address hidden>
Date: Thu Jun 18 12:58:03 2015 -0700

    Get better at closing WSGI iterables.

    PEP 333 (WSGI) says: "If the iterable returned by the application has
    a close() method, the server or gateway must call that method upon
    completion of the current request[.]"

    There's a bunch of places where we weren't doing that; some of them
    matter more than others. Calling .close() can prevent a connection
    leak in some cases. In others, it just provides a certain pedantic
    smugness. Either way, we should do what WSGI requires.

    Noteworthy goofs include:

      * If a client is downloading a large object and disconnects halfway
        through, a proxy -> obj connection may be leaked. In this case,
        the WSGI iterable is a SegmentedIterable, which lacked a close()
        method. Thus, when the WSGI server noticed the client disconnect,
        it had no way of telling the SegmentedIterable about it, and so
        the underlying iterable for the segment's data didn't get
        closed.

        Here, it seems likely (though unproven) that the object server
        would time out and kill the connection, or that a
        ChunkWriteTimeout would fire down in the proxy server, so the
        leaked connection would eventually go away. However, a flurry of
        client disconnects could leave a big pile of useless connections.

      * If a conditional request receives a 304 or 412, the underlying
        app_iter is not closed. This mostly affects conditional requests
        for large objects.

    The leaked connections were noticed by this patch's co-author, who
    made the changes to SegmentedIterable. Those changes helped, but did
    not completely fix, the issue. The rest of the patch is an attempt to
    plug the rest of the holes.

    Co-Authored-By: Romain LE DISEZ <email address hidden>

    Change-Id: I168e147aae7c1728e7e3fdabb7fba6f2d747d937
    Closes-Bug: #1466549

The first noteworthy goof looks like what we're seeing in this bug. This
patch is already in Liberty, but it was applied well after 2.2.0 (Juno) and
Kilo. So it only needs to be backported.

My 2 cents,
Matt

On Wed, Oct 14, 2015 at 7:59 PM, Christian Schwede <
<email address hidden>> wrote:

> Just to add more light on this: most of the connections on pre-12d8a53
> are still not closed after 50 minutes. That definetely does not happen
> on a branch on or after commit 12d8a53.
>
> --
> You received this bug notification because you are a member of Swift
> Core security contacts, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1493303
>
> Title:
> Swift proxy memory leak on unfinished read
>
> Status in ubuntu-cloud-archive:
> New
> Status in OpenStack Security Advisory:
> Incomplete
> Status in OpenStack Object Storage (swift):
> New
> Status in swift package in Ubuntu:
> New
>
> Bug description:
> This issue is being treated as a potential security risk under
> embargo. Pleas...

Read more...

This bug can still be reproduced in an easy and consistent manner. I've attached a script reproducing the leaks. It simply spins up a swift SAIO box in vagrant, and then runs the curl command mentioned in this issue. Every time the script is run, more memory is leaked. If the proxy-server is restared, the memory will be freed.

A small note: while the script was provisioning the vagrant box for the first time, apt failed. This was easily fixed:
$ cd vagrant-swift-all-in-one/
$ vagrant ssh -c "sudo apt-get -f install -y"
$ vagrant provision

Then just rerun the script:
$ cd ..
$ ./leakreproducer.sh

I was able to reproduce this on saio instance of 8Go of ram, deployed with openstack-swift-2.3.0-2.fc23.noarch (which doesn't have 12d8a53). However it didn't cause a noticeable effect since the process was killed and automatically restarted without causing disruption to other services. Also the leak was quickly limited by the maximum number of open file and it took about 42k connections until the process reserved all memory.

Anyway this seems like a duplicate of bug 1466549 which is already public, can we mark this bug as duplicate and open it ?

Andreas Andersen (yoshiyaka) wrote :

The problem is not that the machine runs out of memory, the problem is that the kernel runs out of buffer space for the TCP stack. This renders the machine unable to do any form of network communication, including SSH. It will however not get the proxy process killed, as there's still plenty of memory left on the machine (just not for TCP). This effectively renders the machine completely unresponsive until it's rebooted, or the proxy process is restarted.

dmesg gets filled up with messages like these:
[ 276.495010] TCP: out of memory -- consider tuning tcp_mem
[ 276.852703] TCP: out of memory -- consider tuning tcp_mem
[ 277.996776] TCP: out of memory -- consider tuning tcp_mem
[ 278.111035] TCP: out of memory -- consider tuning tcp_mem
[ 279.840715] TCP: out of memory -- consider tuning tcp_mem
[ 280.421439] TCP: out of memory -- consider tuning tcp_mem
[ 281.435703] TCP: out of memory -- consider tuning tcp_mem
[ 283.579847] TCP: out of memory -- consider tuning tcp_mem
[ 283.616647] TCP: out of memory -- consider tuning tcp_mem
[ 283.652761] TCP: out of memory -- consider tuning tcp_mem

These symptons are easily reproduced with the script I posted above, just run it a few times, and the machine will become completely unresponsive. The following line should do the trick:
$ vagrant ssh -c 'for i in {1..100}; do /vagrant/leakconnections.sh; done'

That behavior might be related to vagrant networking setup, using a nova instance with fedora kernel didn't broke ssh connection (active and new one). I run 5 parallel "while true; curl ...; done" using a public SLO object (no need for token).

Hence the question asked on bug 1466549 comment #7:
Does that mean that the issue described here is confirmed to be reproducible under most deployments?

Andreas Andersen (yoshiyaka) wrote :

No, I don't think is related to vagrant, we get the same results in our production environment (that's why we filed the original ticket, our nodes we upgraded to 2.4.0 sunk after the upgrade). The production environment is running on bare metal, so it shouldn't be related to any form of virtualization.

We also tried 2.5.0 in production, and still get the same issues. The last good version for us was 2.2.x, after this we skipped straight to 2.4.0 (I don't think ubuntu packaged 2.3.0) and started having problems.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in swift (Ubuntu):
status: New → Confirmed

Hum, if this reproduce in 2.4.0 and 2.5.0 then it may be a different issue than bug 1466549 for which the fix (12d8a53) is included in those releases...

It seems like the leakreproducer.sh is stressing the proxy-server using localhost, would this be a different scenario compared to external connection ?

Andreas Andersen (yoshiyaka) wrote :

No, it's not different compared to an external connection. If leakproducer.sh is run outside of the virtual machine (just change localhost to 192.168.8.80 in the script), the problem still persists.

Another thing which migth be interesting to note is that after the memory has leaked, it isn't freed even if the object-server is stopped. The proxy-server needs to be restarted in order for the memory to be freed.

John Dickinson (notmyname) wrote :

Andreas, what version of curl are you using?

I'm not able to see any issues when running leakconnections.sh from my mac (curl 7.43.0), but I did find something when running directly on my precise SAIO (curl 7.22).

Andreas Andersen (yoshiyaka) wrote :

I'm running curl 7.43.0 on ubuntu 15.10. I don't think the issue is tied to the curl version though, this happens for us in our production environment when running ffmpeg to transcode videos. When ffmpeg is running it seeks a lot in the stream by opening a new socket with some offset, doing a short read, and then closing the socket. The read and close without consuming the whole object seems to be the key to the problem here.

When running curl from your mac, what output do you get? You should see a lot of lines like these:
curl: (28) Operation timed out after 102 milliseconds with 769995 out of 524288512 bytes received
These are due to curl closing the socket without consuming the whole object, triggering the bug.

@notmyname, does your precise SAIO is running with 12d8a53 ?

Andreas Andersen (yoshiyaka) wrote :

Any news on this? Can we consider this bug confirmed as it can be consistently reproduced?

It seems that I can reproduce this with current master (exactly after 12d8a53 ) in 2 environment, one is ubuntu 14.04.02 (3.13.0-48-generic kernel) on AWS and the other is 14.04.01 (3.13.0-36-generic) on my laptop virtual machine.

I didn't look at the reason in detail yet but it looks like that we have an issue for that.

My procedure is:

1. upload a 50MB slo object via swift command (i.e. swift upload ---use-slo)
2. Download the slo object via curl, and then stop the download stream via -m (timeout) option
3. cat /proc/net/sockstat shows as follows:

sockets: used 314
TCP: inuse 20 orphan 0 tw 0 alloc 123 mem 3919 (<- I found the "mem" increased by the curl requests)
UDP: inuse 3 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

4. It looks also increase proxy-server memory usage (confilemed by ps and top command)

It looks that larger number of timeout increased the used memory rapidly. i.e. "-m 0.1" made proxy-server to spend more memory than "-m 0.01" (both of them killed the connections before the task finished)

However, I couldn't find CLOSE_WAIT/FIN_WAIT in netstat. (it might be related that I turned on tcp_recycle in sysctl)

Right now I'm not sure but it might be depends on kernel version because I tested in closer version with reported environment so I'll try to make sure another environment if we can reproduce this anytime.

Changed in swift:
status: New → Confirmed
Samuel Merritt (torgomatic) wrote :

This is weird. I was able to reproduce the problem; the proxy server accumulates a bunch of open filehandles for dead sockets.

I can see with strace that the sockets in question are used for client <--> proxy communication. We're not leaking connections to the storage backends. Also, it looks like someone is trying to clean them up, but is just bad at it. Check this out:

    # we have a GET request from the client; the socket is fd 220
    accept(4, {sa_family=AF_INET, sin_port=htons(38280), sin_addr=inet_addr("127.0.0.1")}, [16]) = 220
    fcntl(220, F_GETFL) = 0x2 (flags O_RDWR)
    fcntl(220, F_SETFL, O_RDWR|O_NONBLOCK) = 0
    fcntl(220, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
    fcntl(220, F_SETFL, O_RDWR|O_NONBLOCK) = 0
    sendto(3, "<139>proxy-server: STDERR: (2834"..., 65, 0, NULL, 0) = 65
    accept(4, 0x7fffa0e778c0, [16]) = -1 EAGAIN (Resource temporarily unavailable)
    recvfrom(220, "GET /v1/AUTH_test/test/largefile"..., 65536, 0, NULL, NULL) = 160

    # ... removed stuff talking to memcached + storage backends

    # then we try to send something and learn that it's shut down, so SIGPIPE
    sendto(220, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 0, NULL, 0) = -1 EPIPE (Broken pipe)
    --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=28347, si_uid=1000} ---
    # stupidly, we try to send it again ang get another SIGPIPE
    sendto(220, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 0, NULL, 0) = -1 EPIPE (Broken pipe)
    --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=28347, si_uid=1000} ---

    # attempted cleanup...
    shutdown(220, SHUT_RDWR) = -1 ENOTCONN (Transport endpoint is not connected)

    # and now we completely forget about fd 220 and it's never mentioned again. there's the leak.
    poll([{fd=4, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 13) = 1 ([{fd=4, revents=POLLIN}])
    # ...thousands more lines not mentioning fd 220

What's weird here is that eventlet is doing this at least somewhat intentionally. Here's an annotated eventlet.wsgi.HttpProtocol.finish():

    def finish(self):
        try:
            # this tries to flush any buffers; this is probably what
            # causes the second sendto()/SIGPIPE pair, but I have not
            # verified that.
            BaseHTTPServer.BaseHTTPRequestHandler.finish(self)
        except socket.error as e:
            # Broken pipe, connection reset by peer
            if support.get_errno(e) not in BROKEN_SOCK:
                raise
        # This is responsible for the shutdown call. It executes every time;
        # the try/except above doesn't let any exceptions out that would
        # exit this method early.
        greenio.shutdown_safe(self.connection)

        # Here's the fun part: this method call gets executed, but it
        # doesn't make any syscalls. Something is causing this to be a no-op,
        # but I don't know what it is.
        self.connection.close()

Looks like it might be a bug in eventlet, maybe? I'm not convinced this bug is fixable from within Swift.

Samuel Merritt (torgomatic) wrote :

Also, there are two separate issues here. The first is fixed by commit 12d8a53 and concerns a leak of connections from proxy to storage server. This is the one being discussed for backporting. The second is a file descriptor leak for client-to-proxy connections. It is reproducible on master (418f9b21) and may be in Swift, eventlet, somewhere else, or the interaction between any two or more of those.

Samuel Merritt (torgomatic) wrote :

never mind... it's only reproducible with SLOs and DLOs. Not even regular Swift objects. Something in our SegmentedIterable class is probably the culprit.

feh.

Samuel Merritt (torgomatic) wrote :

git bisect claims it's commit 5ca49ca92485b6ba868544f12fa524d9d7b666c6 that introduced the bug. conveniently, this commit changes a bunch of stuff in SegmentedIterable, validating the guess in comment #25.

Samuel Merritt (torgomatic) wrote :

Experimentally, this fixes the bug. Having a heck of a time writing a unit test, though. If anyone wants to jump in there, please feel free.

Awesome Sam!

SegmentItrable never closes its app_iter handle and it causes the handler leak. It makes me sense and I confirmed the leakpatch-1.diff fixed the issue in my environment.

Plus, I tried to improve the fix to be intuitive a bit more (attached). In my patch, SegmentIterable.close will attempt to close self.app_iter at first and it will emit GeneratorExit and then, it will close also backend request iter at current_resp.

Anyways, nice work and thanks! :-)

Andreas Andersen (yoshiyaka) wrote :

@tsuyuzaki-kota we've deployed your patch to some of our servers now. Everything seems to be fixed when running some simple tests that used to trigger the bug! I'll let it run for a day or two and see if everything is stable.

Thanks for the good work everybody! :D

Changed in ossa:
status: Incomplete → Confirmed
Andreas Andersen (yoshiyaka) wrote :

Ok, so everything is still fine on our servers. It seems like the patch fixed the bug completely :)

Samuel Merritt (torgomatic) wrote :

Here is a slight modification of Kota's patch, plus a unit test.

The modification was only to change "except GeneratorExit:" to "finally:" so that GeneratorExit propagates to any generators enclosing SegmentedIterable. I don't know of any right now, but future middlewares could easily introduce them.

Now that we have a unit test, I believe this patch to be ready for review.

It seems like both leaks (proxy to server and client to proxy) have been reported on former bug 1466549. I've subscribed the original reporter.

Örjan, I can't find your affiliation, if you want to credit your current employer, please let me know.

Here is the impact description to cover both bugs (which will likely get two different CVE):

Title: Swift proxy-server DoS through Large Object
Reporter: Romain LE DISEZ (OVH), Örjan Persson
Products: Swift
Affects: client-proxy: < 2.4.0 (Liberty)
      proxy-server: =< 2.5.0 (Liberty included)

Description:
Romain LE DISEZ from OVH and Örjan Persson independently reported two vulnerabilities in Swift Large Object. By repeatedly requesting and interrupting connections to a Large Object (Dynamic or Static) URL, a remote attacker may exhausts Swift proxy-server resources, potentially resulting in a denial of service. Note that there are two distinct bugs that can exhaust proxy resources, one for client connection (client to proxy), one for servers connection (proxy to server). All Swift setup are affected.

Örjan Persson (o-42mm) wrote :

Great! Happy to hear that it has been resolved. Kiliaro is my employer.

Thanks Örjan, next description will mention Kiliaro. I just need to figure out what swift release are going to be fixed...
The tip of stable/kilo (463fbe39fa1142d2bb9f7c738c6d88d9f77857f1) doesn't seems to be in any tags
There is no stable/liberty branch.

According to http://docs.openstack.org/releases/, it looks like the fix should land in 2.3.1 (for kilo) and 2.5.1 (for liberty).

@swift-coresec, is this following affect line correct ?

Affects: client to proxy: >=2.2.1, <= 2.3.0
                proxy to server: >=2.2.1, <= 2.3.0, >= 2.4.0, <= 2.5.0

Changed in swift:
importance: Undecided → Critical
John Dickinson (notmyname) wrote :

Sam, I think the patch is ok, but the test seems to only be testing the implementation of FakeSwift, not the actual change in SegmentedIterable. When only applying the changes from this patch in the test/ directory, unittests still pass.

John Dickinson (notmyname) wrote :

Sam what's the status of tests?

For everyone, it would be good to review the patch in #31 to see how the patch itself looks

Samuel Merritt (torgomatic) wrote :

Here's a patch with a commit message explaining what's going on. Since it's a fix for a memory leak, the test is fairly crappy, but it's something.

Samuel Merritt (torgomatic) wrote :

Let's try this again, only without the leftover print statements.

@swift-coresec, please review proposed patch in comment #38. Thanks!

John Dickinson (notmyname) wrote :

+2 from me

great work, Sam and Kota

Matthew Oliver (matt-0) wrote :

+2 from me too, code looks good and test is working.

The patch seems pretty trivial, would it be possible to get backports for stable/kilo and liberty release too ?

Samuel Merritt (torgomatic) wrote :

kilo

Samuel Merritt (torgomatic) wrote :

liberty

Thanks Samuel.

So iirc, the last three patches only fixe the proxy to server connection leak. For client to proxy leak (bug 1466549) the fix only got merged for stable/liberty. The missing stable/kilo backport is proposed here: https://review.openstack.org/#/c/217750/

@swift-coresec, any chance to get https://review.openstack.org/#/c/217750/ merged before sending the advance notification ?

Here is the final impact description for both bug 1493303 and bug 1466549,
If it's accurate, I'd like to send the advance notification asap with a disclosure date set to:
2016-01-20, 1500UTC

Title: Swift proxy-server DoS through Large Object
Reporter: Romain LE DISEZ (OVH), Örjan Persson (Kiliaro)
Products: Swift
Affects: client to proxy: >=2.2.1 <= 2.3.0
         proxy to server: >=2.2.1 <= 2.3.0, >= 2.4.0 <= 2.5.0

Description:
Romain LE DISEZ from OVH and Örjan Persson from Kiliaro independently
reported two vulnerabilities in Swift Large Object. By repeatedly
requesting and interrupting connections to a Large Object (Dynamic or
Static) URL, a remote attacker may exhausts Swift proxy-server
resources, potentially resulting in a denial of service. Note that there
are two distinct bugs that can exhaust proxy resources, one for client
connection (client to proxy), one for servers connection (proxy to
server). All Swift setup are affected.

Changed in ossa:
status: Confirmed → In Progress
summary: - Swift proxy memory leak on unfinished read
+ Swift proxy memory leak on unfinished read (CVE-2016-0738)

+2 for patch at #38 applied to master. Verified that this fixes socket leak with early-terminated dlo and slo downloads (on SAIO on Ubuntu 14.04.2)

There's no test for DLO. Not a blocker and possibly redundant since ultimately it is the close call to SegmentedIterable that is being tested.

+2 for the liberty backport at #44

I could not apply the kilo backport at #43:

anc@u128:~/0dev/swift ((036c2f3...))$ git checkout origin/stable/kilo
HEAD is now at 036c2f3... Get better at closing WSGI iterables.
anc@u128:~/0dev/swift ((036c2f3...))$ git am ../patches/socket-leak-kilo-backport.diff
Applying: Fix memory/socket leak in proxy on truncated SLO/DLO GET
error: patch failed: swift/common/request_helpers.py:431
error: swift/common/request_helpers.py: patch does not apply
Patch failed at 0001 Fix memory/socket leak in proxy on truncated SLO/DLO GET
The copy of the patch that failed is found in:
   /home/anc/0dev/swift/.git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Oups, their is now a conflict for stable/kilo backport (proposed in comment #43).

This patch do: in swift/common/request_helpers.py
+ def close(self):
+ """
+ Called when the client disconnect. Ensure that the connection to the
+ backend server is closed.
+ """
+ close_if_possible(self.app_iter)

While current stable/kilo has:
    def close(self):
        """
        Called when the client disconnect. Ensure that the connection to the
        backend server is closed.
        """
        if self.current_resp:
            close_if_possible(self.current_resp.app_iter)

Samuel, can you please fix that backport?

I think I fixed the conflict with stable/kilo (at 036c2f3) in the attached patch

Sam had better take a look at it to be sure. 036c2f3 added stuff that seemed to allow this patch to be closer to the others i.e. self.current_resp

Samuel Merritt (torgomatic) wrote :

The patch in #51 looks good to me.

The patch in comment #51 works for me. To sum-up, the upcoming advisory needs those patch to be applied:

 - https://review.openstack.org/#/c/217750/ (the stable/kilo backport for bug 1466549)
 - patch in comment #51 (the stable/kilo backport of this bug)
 - patch in comment #44 (the stable/liberty backport of this bug)
 - patch in comment #38 (the master/mitaka fix of this bug)

No need for stable/liberty or master/mitaka fix for bug 1466549 since liberty release already has this fix.

Changed in ossa:
status: In Progress → Fix Committed
description: updated
information type: Private Security → Public Security
summary: - Swift proxy memory leak on unfinished read (CVE-2016-0738)
+ [OSSA 2016-004] Swift proxy memory leak on unfinished read
+ (CVE-2016-0738)

The attachment "leakpatch-1.diff" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch

Reviewed: https://review.openstack.org/270234
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=a4c1825a026655b7ed21d779824ae7c25318fd52
Submitter: Jenkins
Branch: stable/kilo

commit a4c1825a026655b7ed21d779824ae7c25318fd52
Author: Samuel Merritt <email address hidden>
Date: Tue Dec 8 16:36:05 2015 -0800

    Fix memory/socket leak in proxy on truncated SLO/DLO GET

    When a client disconnected while consuming an SLO or DLO GET response,
    the proxy would leak a socket. This could be observed via strace as a
    socket that had shutdown() called on it, but was never closed. It
    could also be observed by counting entries in /proc/<pid>/fd, where
    <pid> is the pid of a proxy server worker process.

    This is due to a memory leak in SegmentedIterable. A SegmentedIterable
    has an 'app_iter' attribute, which is a generator. That generator
    references 'self' (the SegmentedIterable object). This creates a
    cyclic reference: the generator refers to the SegmentedIterable, and
    the SegmentedIterable refers to the generator.

    Python can normally handle cyclic garbage; reference counting won't
    reclaim it, but the garbage collector will. However, objects with
    finalizers will stop the garbage collector from collecting them* and
    the cycle of which they are part.

    For most objects, "has finalizer" is synonymous with "has a __del__
    method". However, a generator has a finalizer once it's started
    running and before it finishes: basically, while it has stack frames
    associated with it**.

    When a client disconnects mid-stream, we get a memory leak. We have
    our SegmentedIterable object (call it "si"), and its associated
    generator. si.app_iter is the generator, and the generator closes over
    si, so we have a cycle; and the generator has started but not yet
    finished, so the generator needs finalization; hence, the garbage
    collector won't ever clean it up.

    The socket leak comes in because the generator *also* refers to the
    request's WSGI environment, which contains wsgi.input, which
    ultimately refers to a _socket object from the standard
    library. Python's _socket objects only close their underlying file
    descriptor when their reference counts fall to 0***.

    This commit makes SegmentedIterable.close() call
    self.app_iter.close(), thereby unwinding its generator's stack and
    making it eligible for garbage collection.

    * in Python < 3.4, at least. See PEP 442.

    ** see PyGen_NeedsFinalizing() in Objects/genobject.c and also
       has_finalizer() in Modules/gcmodule.c in Python.

    *** see sock_dealloc() in Modules/socketmodule.c in Python. See
        sock_close() in the same file for the other half of the sad story.

    This closes CVE-2016-0738.

    Closes-Bug: 1493303

    Change-Id: I9b617bfc152dca40d1750131d1d814d85c0a88dd
    Co-Authored-By: Kota Tsuyuzaki <email address hidden>

tags: added: in-stable-kilo
tags: added: in-stable-liberty

Reviewed: https://review.openstack.org/270235
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=8c1976aa771f8c43c5dbe676bd9a5efc69f09eae
Submitter: Jenkins
Branch: stable/liberty

commit 8c1976aa771f8c43c5dbe676bd9a5efc69f09eae
Author: Samuel Merritt <email address hidden>
Date: Tue Dec 8 16:36:05 2015 -0800

    Fix memory/socket leak in proxy on truncated SLO/DLO GET

    When a client disconnected while consuming an SLO or DLO GET response,
    the proxy would leak a socket. This could be observed via strace as a
    socket that had shutdown() called on it, but was never closed. It
    could also be observed by counting entries in /proc/<pid>/fd, where
    <pid> is the pid of a proxy server worker process.

    This is due to a memory leak in SegmentedIterable. A SegmentedIterable
    has an 'app_iter' attribute, which is a generator. That generator
    references 'self' (the SegmentedIterable object). This creates a
    cyclic reference: the generator refers to the SegmentedIterable, and
    the SegmentedIterable refers to the generator.

    Python can normally handle cyclic garbage; reference counting won't
    reclaim it, but the garbage collector will. However, objects with
    finalizers will stop the garbage collector from collecting them* and
    the cycle of which they are part.

    For most objects, "has finalizer" is synonymous with "has a __del__
    method". However, a generator has a finalizer once it's started
    running and before it finishes: basically, while it has stack frames
    associated with it**.

    When a client disconnects mid-stream, we get a memory leak. We have
    our SegmentedIterable object (call it "si"), and its associated
    generator. si.app_iter is the generator, and the generator closes over
    si, so we have a cycle; and the generator has started but not yet
    finished, so the generator needs finalization; hence, the garbage
    collector won't ever clean it up.

    The socket leak comes in because the generator *also* refers to the
    request's WSGI environment, which contains wsgi.input, which
    ultimately refers to a _socket object from the standard
    library. Python's _socket objects only close their underlying file
    descriptor when their reference counts fall to 0***.

    This commit makes SegmentedIterable.close() call
    self.app_iter.close(), thereby unwinding its generator's stack and
    making it eligible for garbage collection.

    * in Python < 3.4, at least. See PEP 442.

    ** see PyGen_NeedsFinalizing() in Objects/genobject.c and also
       has_finalizer() in Modules/gcmodule.c in Python.

    *** see sock_dealloc() in Modules/socketmodule.c in Python. See
        sock_close() in the same file for the other half of the sad story.

    This closes CVE-2016-0738.

    Closes-Bug: 1493303

    Change-Id: I74ea49eaa7d5c372cdc2399148d5495d3007dbd0
    Co-Authored-By: Kota Tsuyuzaki <email address hidden>

Changed in swift:
status: Confirmed → Fix Released

Reviewed: https://review.openstack.org/270233
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=58359269b0e971e52f0eb7f97221566ca2148014
Submitter: Jenkins
Branch: master

commit 58359269b0e971e52f0eb7f97221566ca2148014
Author: Samuel Merritt <email address hidden>
Date: Tue Dec 8 16:36:05 2015 -0800

    Fix memory/socket leak in proxy on truncated SLO/DLO GET

    When a client disconnected while consuming an SLO or DLO GET response,
    the proxy would leak a socket. This could be observed via strace as a
    socket that had shutdown() called on it, but was never closed. It
    could also be observed by counting entries in /proc/<pid>/fd, where
    <pid> is the pid of a proxy server worker process.

    This is due to a memory leak in SegmentedIterable. A SegmentedIterable
    has an 'app_iter' attribute, which is a generator. That generator
    references 'self' (the SegmentedIterable object). This creates a
    cyclic reference: the generator refers to the SegmentedIterable, and
    the SegmentedIterable refers to the generator.

    Python can normally handle cyclic garbage; reference counting won't
    reclaim it, but the garbage collector will. However, objects with
    finalizers will stop the garbage collector from collecting them* and
    the cycle of which they are part.

    For most objects, "has finalizer" is synonymous with "has a __del__
    method". However, a generator has a finalizer once it's started
    running and before it finishes: basically, while it has stack frames
    associated with it**.

    When a client disconnects mid-stream, we get a memory leak. We have
    our SegmentedIterable object (call it "si"), and its associated
    generator. si.app_iter is the generator, and the generator closes over
    si, so we have a cycle; and the generator has started but not yet
    finished, so the generator needs finalization; hence, the garbage
    collector won't ever clean it up.

    The socket leak comes in because the generator *also* refers to the
    request's WSGI environment, which contains wsgi.input, which
    ultimately refers to a _socket object from the standard
    library. Python's _socket objects only close their underlying file
    descriptor when their reference counts fall to 0***.

    This commit makes SegmentedIterable.close() call
    self.app_iter.close(), thereby unwinding its generator's stack and
    making it eligible for garbage collection.

    * in Python < 3.4, at least. See PEP 442.

    ** see PyGen_NeedsFinalizing() in Objects/genobject.c and also
       has_finalizer() in Modules/gcmodule.c in Python.

    *** see sock_dealloc() in Modules/socketmodule.c in Python. See
        sock_close() in the same file for the other half of the sad story.

    This closes CVE-2016-0738.

    Closes-Bug: 1493303

    Co-Authored-By: Kota Tsuyuzaki <email address hidden>

    Change-Id: Ib86c4c45641485ce1034212bf6f53bb84f02f612

Changed in ossa:
status: Fix Committed → Fix Released

This issue was fixed in the openstack/swift 2.6.0 release.

Download full text (30.2 KiB)

Reviewed: https://review.openstack.org/272201
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=f9b7fd3074b5b0e5d6ea879d4144f7bfeec5d46b
Submitter: Jenkins
Branch: feature/crypto

commit e13a03c379273ee10e678818078b9c40a96a7dc9
Author: Tim Burke <email address hidden>
Date: Wed Jan 20 16:06:26 2016 -0800

    Stop overriding builtin range

    Change-Id: I315f8b554bb9e96659b455f4158f074961bd6498

commit 0a404def7d54d1ef1c85c11a378052260c4fda4c
Author: John Dickinson <email address hidden>
Date: Wed Jan 20 15:19:35 2016 -0800

    remove unneeded duplicate dict keys

    Change-Id: I926d7aaa9df093418aaae54fe26e8f7bc8210645

commit 221f94fdd39fd2dcd9a2e5565adceab615d55913
Author: John Dickinson <email address hidden>
Date: Tue Jan 19 14:50:24 2016 -0800

    authors and changelog updates for 2.6.0

    Change-Id: Idd0ff9e70abc0773be183c37cd6125fe852da7c0

commit 58359269b0e971e52f0eb7f97221566ca2148014
Author: Samuel Merritt <email address hidden>
Date: Tue Dec 8 16:36:05 2015 -0800

    Fix memory/socket leak in proxy on truncated SLO/DLO GET

    When a client disconnected while consuming an SLO or DLO GET response,
    the proxy would leak a socket. This could be observed via strace as a
    socket that had shutdown() called on it, but was never closed. It
    could also be observed by counting entries in /proc/<pid>/fd, where
    <pid> is the pid of a proxy server worker process.

    This is due to a memory leak in SegmentedIterable. A SegmentedIterable
    has an 'app_iter' attribute, which is a generator. That generator
    references 'self' (the SegmentedIterable object). This creates a
    cyclic reference: the generator refers to the SegmentedIterable, and
    the SegmentedIterable refers to the generator.

    Python can normally handle cyclic garbage; reference counting won't
    reclaim it, but the garbage collector will. However, objects with
    finalizers will stop the garbage collector from collecting them* and
    the cycle of which they are part.

    For most objects, "has finalizer" is synonymous with "has a __del__
    method". However, a generator has a finalizer once it's started
    running and before it finishes: basically, while it has stack frames
    associated with it**.

    When a client disconnects mid-stream, we get a memory leak. We have
    our SegmentedIterable object (call it "si"), and its associated
    generator. si.app_iter is the generator, and the generator closes over
    si, so we have a cycle; and the generator has started but not yet
    finished, so the generator needs finalization; hence, the garbage
    collector won't ever clean it up.

    The socket leak comes in because the generator *also* refers to the
    request's WSGI environment, which contains wsgi.input, which
    ultimately refers to a _socket object from the standard
    library. Python's _socket objects only close their underlying file
    descriptor when their reference counts fall to 0***.

    This commit makes SegmentedIterable.close() call
    self.app_iter.close(), thereby unwinding its generator's stack and
    making it eligible for garbage collection.

    * in Python < 3...

tags: added: in-feature-crypto

Reviewed: https://review.openstack.org/270241
Committed: https://git.openstack.org/cgit/openstack/ossa/commit/?id=606a18e718aed329a9d42d298e3119f0f1974e3d
Submitter: Jenkins
Branch: master

commit 606a18e718aed329a9d42d298e3119f0f1974e3d
Author: Tristan Cacqueray <email address hidden>
Date: Wed Jan 20 10:19:30 2016 -0500

    Adds OSSA-2016-004 (CVE-2016-0737, CVE-2016-0738)

    Related-Bug: #1466549
    Related-Bug: #1493303
    Change-Id: Id7b40ab5101ccbd889c4ffc6bd9629bb5f2b8d7f

Change abandoned by Alistair Coles (<email address hidden>) on branch: feature/crypto
Review: https://review.openstack.org/277950

Download full text (71.7 KiB)

Reviewed: https://review.openstack.org/290148
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=0f7f1de233919a0b046349a3e31ae7fc8675a1c5
Submitter: Jenkins
Branch: feature/hummingbird

commit d6b4587a554b51ba733b151e0d924735b63d07e0
Author: Olga Saprycheva <email address hidden>
Date: Tue Mar 8 10:57:56 2016 -0600

    Removed redundant file for flake8 check

    Change-Id: I4322978aa20ee731391f7709bbd79dee140fc703

commit 643dbce134140530eef2ae62c42fef1107f905ed
Author: OpenStack Proposal Bot <email address hidden>
Date: Tue Mar 8 06:35:49 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I96b8ff1287bf219c5f8d56a3a4868c1063a953f9

commit 83713d37f0331c5ce9d377f4b4e8724551ae30ca
Author: Daisuke Morita <email address hidden>
Date: Mon Mar 7 18:30:47 2016 -0800

    Missing comments for storage policy parameter

    There are missing comments about storege_policy_index so appropriate
    comments are added.

    Change-Id: I3de3f0e6864e65918ca1a13cce70f19c23d295f5

commit 2cff2dec3d1c4588f5103e39679c43b3dded6dcb
Author: Olga Saprycheva <email address hidden>
Date: Fri Mar 4 15:19:39 2016 -0600

    Fixed pep8 and flake8 errors in doc/source/conf.py and updated flake8 commands in tox.ini to test it.

    Change-Id: I2add370e4cfb55d1388e3a8b41f688a7f3f2c621

commit 043fbca6d08648baa314ea2236f1ccdca8785f16
Author: Christian Schwede <email address hidden>
Date: Fri Mar 4 09:33:17 2016 +0000

    Remove Erasure Coding beta status from docs

    This removes notes stating support for Erasure coding as beta. Questions
    regarding the stability of EC are coming up regularly, and are often referring
    to the docs that state EC as still in beta.

    Besides this, a note marking statsd support as beta has been removed as well.

    Change-Id: If4fb6a5c4cb741d42953db3cee8cb17a1d774e15

commit 09c73b86e9255f28fbd4cf571a52c17d549a8f9a
Author: Pete Zaitcev <email address hidden>
Date: Thu Mar 3 10:24:28 2016 -0700

    Fix a crash in exception printout

    Says the number of arguments does not match the number of '%'.

    Change-Id: I8b5e395a07328fb9d4ac7a19f8ed2ae1637bee3b

commit fad5fabe0a22e8a86635a66523dd3d3d3b1fa705
Author: Tim Burke <email address hidden>
Date: Thu Mar 3 15:07:08 2016 +0000

    During functional tests, 404 response to a DELETE is successful

    Previously, we would only consider 204 responses successful, which would
    cause some spurious gate failures, such as

    http://logs.openstack.org/66/287666/3/check/gate-swift-dsvm-functional/c6d2673/console.html#_2016-03-03_13_41_07_846

    Change-Id: Ic8c300647924352a297a2781b50064f7657038b4

commit e91de49d6864b3794f8dc5acd9c1bf0c2f7409d1
Author: Alistair Coles <email address hidden>
Date: Mon Aug 10 10:30:10 2015 -0500

    Update container on fast-POST

    This patch makes a number of changes to enable content-type
    metadata to be updated when using the fast-POST mode of
    operation, as proposed in the associated spec ...

tags: added: in-feature-hummingbird
Pratap D (pratapd) on 2016-06-03
Changed in swift (Ubuntu):
status: Confirmed → Fix Released
Pratap D (pratapd) wrote :

Changed in swift (Ubuntu):

By mistake I changed the status to "Fix released".

Status needs to be changed to "Confirmed".
Please change the status to "Confirmed".

Sorry for the trouble.

@james-page, it seems like you are the one who can change the "swift (ubuntu)" task status. Please put it back to "confirmed"...

James Page (james-page) on 2016-06-03
Changed in swift (Ubuntu):
status: Fix Released → Confirmed
James Page (james-page) on 2016-06-10
Changed in swift (Ubuntu Yakkety):
status: Confirmed → Fix Released
Changed in swift (Ubuntu Xenial):
status: New → Fix Released
Changed in swift (Ubuntu Wily):
status: New → Triaged
Changed in swift (Ubuntu Vivid):
status: New → Won't Fix
Changed in swift (Ubuntu Trusty):
status: New → Triaged
James Page (james-page) on 2016-06-10
no longer affects: swift (Ubuntu Vivid)
James Page (james-page) on 2016-09-08
Changed in cloud-archive:
status: New → Invalid
Changed in swift (Ubuntu Wily):
status: Triaged → Won't Fix
Changed in swift (Ubuntu Trusty):
status: Triaged → Won't Fix
status: Won't Fix → New
importance: Undecided → High
To post a comment you must log in.
This report contains Public Security information  Edit
Everyone can see this security related information.

Other bug subscribers