chunked transfer client disconnect causes unhandled value error in object server

Bug #667956 reported by clayg
36
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Low
Unassigned

Bug Description

If a client disconnects while the object-server is expecting the chunk length line on a chunked transfer - it blows up.

This may need to be patched upstream:

Oct 28 13:05:22 saio object-server STDOUT: File "/usr/local/lib/python2.6/dist-packages/eventlet-0.9.9-py2.6.egg/eventlet/wsgi.py", line 137, in _chunked_read
Oct 28 13:05:22 saio object-server STDOUT: self.chunk_length = int(rfile.readline().split(";", 1)[0], 16)
Oct 28 13:05:22 saio object-server STDOUT: ValueError: invalid literal for int() with base 16: ''

... or mabye we can catch it.

Revision history for this message
Mike Barton (redbo) wrote :

It would be nice if it raised something more selective upstream. But we should be able to catch it by putting a try/except (ValueError, IOError) around the request.body_file.read loop.

Thierry Carrez (ttx)
Changed in swift:
milestone: 1.2.0 → 1.2-rc
Chuck Thier (cthier)
Changed in swift:
milestone: 1.2-rc → 1.3.0
Thierry Carrez (ttx)
Changed in swift:
milestone: 1.3.0 → none
Revision history for this message
Juan J. Martínez (jjmartinez) wrote :

Are we going to catch these? Will it be fixed upstream?

It's not a "real" problem besides filling the logs with errors, but anyway...

Changed in swift:
status: New → Confirmed
Revision history for this message
Chuck Thier (cthier) wrote :

I think this might have been fixed with my recent fix of squelching the stdout printing of tracebacks: https://github.com/openstack/swift/commit/4c6a35448332d6827af230e5f0c470e112de15a8

Clay, would you mind testing to see if this is the case, and close the bug if so?

--
Chuck

Revision history for this message
clayg (clay-gerrard) wrote :

Well... not exactly. I mean... if the "problem" is the printed traceback - then yes it fixed.

You can still see the ValueError raised and "handled" by the client disconnect generic exception logging. Then it still gets raised again outside of swift, in the eventlet.wsgi server's finally block in the last stanza of handle_one_response - if you have eventlet_debug=True, you can still see it hit stdout, but since it's in a spawn_n you can't catch it.

So... we *really* should address this ugly bug upstream. There's a nice IOError that get's raised if eventlet.wsgi encounters an unexpected EOF during chunked transfer, but no errors raised from read seem to be dealt with in the finally block.

Then there's the swift problem which seems to not close down connections to the backend servers during the generic handling of client disconnect (at least not in the PUT object case).

e.g.

run this:

http://paste.openstack.org/show/27931/

then run:

    sudo netstat -pant

.. you'll see a bunch of established connections to and from object servers (60X0).

then run:

    swift-init proxy stop

... you'll see all of the object servers abruptly abort their connections.

I guess there's a max_upload_time setting on the object server to close down some of this, but I never let it get that far. I don't see a chunk read timeout on the object server...

This is my current approach:

http://paste.openstack.org/show/27931/

But a Timeout on the read's in the object server might make more sense if we could get the timing right :\

Anyways, that is to say, if you want to close this one - I'll have to open another issue(s) :P

Revision history for this message
clayg (clay-gerrard) wrote :

bah, rather, *here's* the start of the basic "close backend connections on disconnect" patch:

http://paste.openstack.org/show/27932/

Revision history for this message
Chuck Thier (cthier) wrote : Re: [Bug 667956] Re: chunked transfer client disconnect causes unhandled value error in object server

hehe... no worries was just curious. Thanks for the deeper explanation.

On Wed, Dec 12, 2012 at 2:23 PM, clayg <email address hidden> wrote:
> bah, rather, *here's* the start of the basic "close backend connections
> on disconnect" patch:
>
> http://paste.openstack.org/show/27932/
>
> --
> You received this bug notification because you are subscribed to
> OpenStack Object Storage (swift).
> https://bugs.launchpad.net/bugs/667956
>
> Title:
> chunked transfer client disconnect causes unhandled value error in
> object server
>
> Status in OpenStack Object Storage (Swift):
> Confirmed
>
> Bug description:
> If a client disconnects while the object-server is expecting the chunk
> length line on a chunked transfer - it blows up.
>
> This may need to be patched upstream:
>
> Oct 28 13:05:22 saio object-server STDOUT: File "/usr/local/lib/python2.6/dist-packages/eventlet-0.9.9-py2.6.egg/eventlet/wsgi.py", line 137, in _chunked_read
> Oct 28 13:05:22 saio object-server STDOUT: self.chunk_length = int(rfile.readline().split(";", 1)[0], 16)
> Oct 28 13:05:22 saio object-server STDOUT: ValueError: invalid literal for int() with base 16: ''
>
> ... or mabye we can catch it.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/swift/+bug/667956/+subscriptions

Tong Li (litong01)
Changed in swift:
assignee: nobody → Tong Li (litong01)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.openstack.org/23475

Changed in swift:
status: Confirmed → In Progress
Revision history for this message
clayg (clay-gerrard) wrote :

Updated close all connections:

https://gist.github.com/clayg/9284301

Revision history for this message
Tom Fifield (fifieldt) wrote :

Patch was abandoned, setting status back to Confirmed.

Changed in swift:
status: In Progress → Confirmed
assignee: Tong Li (litong01) → nobody
Revision history for this message
clayg (clay-gerrard) wrote :

yay!

This *finally* fixed in:

https://review.openstack.org/#/c/156825/

run this:

http://paste.openstack.org/show/27931/

then run:

    sudo netstat -pant

.. and *instead* of a bunch of *ESTABLISHED* connections to and from object servers (60X0) - they are all *TIME_WAIT*

woot woot!

Changed in swift:
status: Confirmed → Fix Committed
Changed in swift:
milestone: none → next-kilo
Thierry Carrez (ttx)
Changed in swift:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in swift:
milestone: 2.3.0-rc1 → 2.3.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.