Proxy-server reading an object when client disconnected!

Bug #1166198 reported by Anton
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
New
Undecided
Unassigned

Bug Description

Hi,
We have problem with proxy servers when client disconnected on read.
When client has disconnected on read, proxy write on LOG:

Apr 4 13:07:17 proxy swift Client disconnected on read (txn: txeb2c1bf8c4b8434b82c964687d79ad79) (client_ip: 192.168.241.1)

After wait interval proxy write:

Apr 4 13:07:27 proxy swift Client did not read from queue within 10s (txn: txeb2c1bf8c4b8434b82c964687d79ad79) (client_ip: 192.168.241.1)

And begins to download the full object from storage node, loading and the network and CPU.

When many clients using download manager and loading files with RANGE HEADER this creates a huge traffic between PROXY and STORAGE NODE and high CPU load.

We look this problem when update swift from 1.4.6 to 1.7.6, after we update swift to 1.8 and the problem persists :(
and sorrry for my english.

Revision history for this message
Anton (hettbox) wrote :

This is clearly seen when working with large data.
We using swift as backend for nginx.

Revision history for this message
Thierry Carrez (ttx) wrote :

Could you elaborate on why this is a security issue ?

Revision history for this message
Anton (hettbox) wrote :

Easy to derive a cluster of stable state

Revision history for this message
Anton (hettbox) wrote :

file proxy/controllers/base.py, line 694

        finally:
            # Ensure the queue getter gets a terminator.
>>> queue.resize(2)
            queue.put(success)
            # Close-out the connection as best as possible.
            if getattr(source, 'swift_conn', None):
                self.close_swift_conn(source)

if I commented queue.resize(2), then problem does not occur!

Revision history for this message
Thierry Carrez (ttx) wrote :

Adding swift-core for confirmation of impact.
My understanding is still that it's more of a performance issue than a security vulnerability, but I'm happy to be proven wrong.

Revision history for this message
Anton (hettbox) wrote :

Sorry, a do not understand your phrase "Adding swift-core for confirmation of impact." :(

Revision history for this message
Anton (hettbox) wrote :

after comment queue.resize(2)
connections to stoages do not closing

Revision history for this message
Anton (hettbox) wrote :

file proxy/controllers/base.py, line 735

    def close_swift_conn(self, src):
        try:
            src.swift_conn.close()
        except Exception:
            pass
        src.swift_conn = None
        try:
            while src.read(self.app.object_chunk_size): #WTF?
                pass
        except Exception:
            pass
        try:
            src.close()
        except Exception:
            pass

Revision history for this message
Anton (hettbox) wrote :

why before closing connection readed all source data?

Revision history for this message
Thierry Carrez (ttx) wrote :

@Anton: I just subscribed the Swift core developers so that they can participate in the impact discussion.

Revision history for this message
Anton (hettbox) wrote :

Ok, thanks.
I comment block

        try:
            while src.read(self.app.object_chunk_size): #WTF?
                pass
        except Exception:
            pass

now all work good

Revision history for this message
gholt (gholt) wrote :

I am unsure why that would solve your issue, but the code you pasted in #8 shows the HTTPConnection being closed, the HTTPResponse being read from (which should empty any buffers then raise an Exception since the HTTPConnection is closed), then the HTTPResponse being closed. It is a bit hard to follow after a few refactors of the code I guess.

It was originally written that way because Python socket objects were not getting garbage collected. It was pretty easy to see the effects after running a day or so and then checking /proc/net/sockstat

However, maybe due to the refactors the gc would work even without these extra shenanigans now, not sure.

But it seems really strange that reading from a closed connection somehow causes you to keep downloading the whole object from the storage node. Hopefully one of us will get some time to retest around this area.

Revision history for this message
gholt (gholt) wrote :

Can you post what operating system, Python version, and Eventlet version you're using where this problem happens?

Revision history for this message
Anton (hettbox) wrote :

OS: Ubuntu 12.04 LTS (Besides trying Ubuntu raring)
Swift: 1.7.6, 1.8.0
Python 2.7.3
python-eventlet: 0.9.16-1ubuntu4.2

Revision history for this message
gholt (gholt) wrote :

Well, one guy here did get this reproduced on Ubuntu Precise, Python 2.7, and Eventlet 0.12.1 -- though I ran the same and couldn't do it. Either way, if that is working for you it should be safe to run that way just be sure to keep an eye on your /proc/net/sockstat to see if sockets are "leaking". This is a pretty confined area of the code so if it's just a problem with our systems we can remove that crazy code and we can patch it easy enough ourselves, or put in a config switch to turn it on or off in case someone else runs into the same problem we had.

Revision history for this message
Anton (hettbox) wrote :

After ~12 hours (upstream ~300mbit, ~600 connections):

# cat /proc/net/sockstat
sockets: used 3100
TCP: inuse 2905 orphan 12 tw 104 alloc 2916 mem 391367
UDP: inuse 12 mem 4
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

I think this is normal?

Revision history for this message
Anton (hettbox) wrote :

Maybe we using old eventlet?

Revision history for this message
Anton (hettbox) wrote :

Hm, one test cluster using ubuntu raring, it uses eventlet 0.12.1

Revision history for this message
gholt (gholt) wrote :

Thanks Anton. Sorry from the trouble this has caused you.

Just keep an eye on that /proc/net/sockstat "normal" depends on the cluster usage unfortunately, though that looks fine to me. Really you just want to make sure it doesn't keep on getting larger and larger over the next few days.

We'll see what the other devs come up with, but probably we'll just make this area of the code configurable for the next version and turned off by default. Better to have a slow leak on one companies system than a fast resource hog on everyone else's.

Revision history for this message
Chuck Thier (cthier) wrote : Re: [Bug 1166198] Re: Proxy-server reading an object when client disconnected!

No, I've tried several versions of eventlet and am able to reproduce the
issue. We are working to see if we can either nail down the root cause,
or at least find a better way of handling the clean up.

On Wed, Apr 10, 2013 at 11:06 AM, Anton <email address hidden> wrote:

> Maybe we using old eventlet?
>
> --
> You received this bug notification because you are subscribed to
> OpenStack Object Storage (swift).
> https://bugs.launchpad.net/bugs/1166198
>
> Title:
> Proxy-server reading an object when client disconnected!
>
> Status in OpenStack Object Storage (Swift):
> New
>
> Bug description:
> Hi,
> We have problem with proxy servers when client disconnected on read.
> When client has disconnected on read, proxy write on LOG:
>
> Apr 4 13:07:17 proxy swift Client disconnected on read (txn:
> txeb2c1bf8c4b8434b82c964687d79ad79) (client_ip: 192.168.241.1)
>
> After wait interval proxy write:
>
> Apr 4 13:07:27 proxy swift Client did not read from queue within 10s
> (txn: txeb2c1bf8c4b8434b82c964687d79ad79) (client_ip: 192.168.241.1)
>
> And begins to download the full object from storage node, loading and
> the network and CPU.
>
> When many clients using download manager and loading files with RANGE
> HEADER this creates a huge traffic between PROXY and STORAGE NODE and
> high CPU load.
>
> We look this problem when update swift from 1.4.6 to 1.7.6, after we
> update swift to 1.8 and the problem persists :(
> and sorrry for my english.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/swift/+bug/1166198/+subscriptions
>

Revision history for this message
Anton (hettbox) wrote :

Understand, thanks!

Revision history for this message
Thierry Carrez (ttx) wrote :

Following notmyname's advice, waiting on a bit more research on this before deciding if it should be considered an exploitable vulnerability in Swift or not.

Revision history for this message
John Dickinson (notmyname) wrote :

It sounds like enough of a concern to keep it private. I'd like to see the results of the research Chuck and Greg are doing, and hopefully a patch before making this public.

Revision history for this message
Thierry Carrez (ttx) wrote :

Not clear yet whether this should trigger an advisory

Changed in ossa:
status: New → Incomplete
Revision history for this message
Thierry Carrez (ttx) wrote :

notmyname wrote:
> It sounds like enough of a concern to keep it private. I'd like to see the results of the research Chuck and Greg are doing,
> and hopefully a patch before making this public.

Chuck/Greg: do you have results to share ? Should we keep this as an open vulnerability ?

Revision history for this message
Anton (hettbox) wrote :

We long time using swift without code

        try:
            while src.read(self.app.object_chunk_size):
                pass
        except Exception:
            pass

no have problems...

Revision history for this message
Anton (hettbox) wrote :

Hi, there are any progress ?

Revision history for this message
Chuck Thier (cthier) wrote :

I haven't had time to look any closer, and was actually hoping someone else might provide some input. The reason that was put there in the first place, was that connections were leaking in the proxy server. It appeared that if the connection was close, but there was still data in the buffers, the connection would linger and not get garbage collected. I'll try to take a closer look at it, but would appreciate input from others (Mike? Sam?)

--
Chuck

Revision history for this message
Anton (hettbox) wrote :

We using swift switout code block:

        try:
            while src.read(self.app.object_chunk_size):
                pass
        except Exception:
            pass

But it seems to me that this causes problems in the stack TCP/IP
After long uptime (~1 month) on servers detected problem with upload data from proxy to storages through local interfaces (they do not have a maximum speed of 1 gigabit).
This problem is solved by rebooting or temporary reduction net.ipv4.tcp_mem

We have value net.ipv4.tcp_mem = 386439 515252 772878
when there was a problem with local intefraces we reduce values to 38643 51525 77287 and back again 386439 515252 772878, after this manipulation intefaces speed was became normal again

Maybe problem not in swift...

sorry for my english =)

Revision history for this message
Anton (hettbox) wrote :

Hi, in sometime I look message in system log:

Nov 1 08:54:56 prx-01 swift Trying to send to client: #012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/proxy/controllers/base.py", line 859, in _make_app_iter#012 # to the socket, so calling its close() method does nothing, and#012Exception: Failed to read all data from the source (txn: txc5d22fde53344cb5abf74-00527366b5) (client_ip: xx.xx.165.227)

maybe it's something to help...
latest version swift from github

Revision history for this message
Thierry Carrez (ttx) wrote :

John/Chuck: still want to keep this one private ? Doesn't seem reproduceable enough to make it a vulnerability, and opening it might get it more visibility and get it fixed quicker (if a fix is actually needed).

Revision history for this message
Samuel Merritt (torgomatic) wrote :

This sounds like a duplicate of bug 1174660, which was fixed with commit def37fb5 and released in Swift 1.10.0.

Revision history for this message
Thierry Carrez (ttx) wrote :

Anton: would be great if you could try out 1.10.0 and confirm that your bug no longer applies there.
Unless someone objects, i'll make that bug public in a few days.

Revision history for this message
Anton (hettbox) wrote :

Hi,
we using latest version from GitHub, all worked normally longtime :)
Thanks!

Jeremy Stanley (fungi)
no longer affects: ossa
information type: Private Security → Public
Revision history for this message
Vil Surkin (vill-srk) wrote :

I applied patch from here https://review.openstack.org/#/c/48538/ to my Grizzly installation. And there is no problem anymore. Also, i don't catch any problems with this patch on grizzly. Everything work fine.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.