TCP leak between proxy and object-server on client disconnection

Bug #1662159 reported by Jean Caron
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
New
Undecided
Unassigned

Bug Description

Policy storage: Erasure coding
Action : PUT

The proxy is leaking some socket when the client disconnects during the upload.

How to reproduce (easier on dlo/slo upload) :
$ timeout 5 swift --os-auth-token $TOKEN --os-storage-url $STORAGE_URL upload container ./slo --segment-size 100000000

$ netstat -tn | grep -P ":60\d\d\s" | grep -v TIME_WAIT | awk '{print $6}'
ESTABLISHED
ESTABLISHED
ESTABLISHED
ESTABLISHED
ESTABLISHED
ESTABLISHED

after the 'client_timeout' on the object-server side (object-server.conf):

$ netstat -tn | grep -P ":60\d\d\s" | grep -v TIME_WAIT | awk '{print $6}'
CLOSE_WAIT
CLOSE_WAIT
CLOSE_WAIT
CLOSE_WAIT
CLOSE_WAIT
CLOSE_WAIT

These CLOSE_WAIT sockets will be freed only on service reload

proxy-server.log :

Feb 6 13:25:23 localhost proxy-server: Client disconnected without sending enough data (txn: txe9f04338c4714982b802e-0058986b2d)

Potential threat: no more socket can be instantiated for a single processus, generating a denial of service.

Tags: leak tcp
Jean Caron (jean.caron)
Changed in swift:
assignee: nobody → Jean Caron (jean.caron)
assignee: Jean Caron (jean.caron) → nobody
Revision history for this message
Janie Richling (jrichli) wrote :

Hello Jean. I have tried to reproduce this against the latest code from master (I made sure to use an EC policy), and I have not yet. I used the same commands you did. I see the " Client disconnected without sending enough data" in my proxy.error log. But after the timeout, there were no 'CLOSE_WAIT' connections. How large was your ./slo file?

Revision history for this message
Jean Caron (jean.caron) wrote :

Hi Janie,

Thanks for looking at it.
The file I am using has a 1008M size.

For information, I hit the bug on swift v2.12

Revision history for this message
Jean Caron (jean.caron) wrote :

It happens with Curl as well (not related to the client side)
The bug appears without slo/dlo.
The socket becomes a zombie and will stay forever in the CLOSE_WAIT state (no reuse)

Revision history for this message
Janie Richling (jrichli) wrote :

If I artificially add a sleep somewhere in the upload of objects (simulating a slower upload over network), then I see that it can seem like these connections are never going to close. If I add enough sleep time, it can take too long to wait for all connections to complete. So I am not convinced that I have been able to reproduce a situation where the connections can be stuck in a CLOSE_WAIT state.

BTW - I had wondered about whether or not this was a security issue, but looking at a past similar bug, this type of issue was not seen in that light: https://bugs.launchpad.net/swift/+bug/1594739/comments/5

Do you have any non-default configuration items like client_timeout or node_timeout?

It seems that test/unit/proxy/test_server.py:TestObjectController.test_ec_client_put_disconnect is attempting to test this particular situation. I don't see how it is asserting the connections are closed at the end, however. Maybe focus on modifying that test to expose this issue.

Revision history for this message
Jean Caron (jean.caron) wrote :

$ cat proxy-server.conf | grep node_timeout | head -n 1
node_timeout = 10

$ cat object-server.conf | grep client_timeout | head -n 1
client_timeout = 60

I tried on a fresh saio, I sometimes get a CLOSE_WAIT staying after the disconnection, but it will disappear (reuse ?) during the next upload.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (master)

Change abandoned by Jean Caron (<email address hidden>) on branch: master
Review: https://review.openstack.org/429697
Reason: new patch at https://review.openstack.org/#/c/437321

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.