Bug #1174660 “when client disconnected, garbage collecting is to...” : Bugs : OpenStack Object Storage (swift)

Eohyung Lee (leoh0) on 2013-04-30

description:

updated

Revision history for this message

Juan J. Martínez (jjmartinez) wrote on 2013-07-23:

#2

I think this is the problem we're experiencing in production with swift 1.7.5.

We use the following shell line to check object-server connections vs proxy client connections:

# echo "object-server: `netstat -alnt | grep :6010 | wc -l`" ; echo "clients: `netstat -alnt | grep :443 | wc -l`";
object-server: 135
clients: 3

(in this output seems to be OK, not a problem yet)

When the object-server load gets too high, the number of connections in object-server is over 1200 and the clients is under 200.

It gets corrected by restarting the proxies.

Revision history for this message

Juan J. Martínez (jjmartinez) wrote on 2013-07-24:

#3

I'm attaching a script to easily reproduce the problem. It may need some tweaking for our swift cluster.

Revision history for this message

Juan J. Martínez (jjmartinez) wrote on 2013-07-24:

#4

script to reproduce the problem Edit (2.5 KiB, text/x-python)

Re attached the test scipt removing the cleanup to make it more effective.

Not it won't remove the file or the test container (cleaning was making the cluster return 404 to some GET requests instead of the intended effect).

Revision history for this message

Juan J. Martínez (jjmartinez) wrote on 2013-07-24:

#5

I can't confirm there's a leak when the connection is just closed instead of reading data until the object server is done.

This is the patch I'm testing.

Revision history for this message

Peter Portante (peter-a-portante) wrote on 2013-07-24:

#6

Can you propose this as a gerrit change? This seems pretty useful and would let us drop the resources on the object server.

Revision history for this message

Juan J. Martínez (jjmartinez) wrote on 2013-07-25:

#7

Yes, Peter! I just didn't have the time to review the tests and see if I can add a test for this!

I'll try to submit today the change through gerrit.

Revision history for this message

Juan J. Martínez (jjmartinez) wrote on 2013-07-25:

#8

In fact I'm testing the patch in 1.7.5, doesn't work like that in 1.9.0 (is _drop_cache!).

As I said, I'll submit it when all tests PASS!

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-07-25: Fix proposed to swift (master)

#9

Fix proposed to branch: master
Review: https://review.openstack.org/38602

Changed in swift:
assignee:	nobody → Juan J. Martínez (jjmartinez)
status:	New → In Progress

Revision history for this message

Juan J. Martínez (jjmartinez) wrote on 2013-07-25:

#10

I removed the patch because it was wrong. See the review request.

Btw, the problem was introduced when bug #1037337 was fixed, see:

https://github.com/openstack/swift/commit/9290471b61a98a1882f0d9e5ce7d883428e2ff36

Revision history for this message

sirkonst (sirkonst) wrote on 2013-09-23:

#11

What the status of patches and problem?

I think it very important for fix, because it's simple way to make DoS attack to swift-cluster. If you make even 10 requests to 5GB file and break them, this can lead to 100% load on the internal network.

Revision history for this message

Juan J. Martínez (jjmartinez) wrote on 2013-09-23:

#12

We're using that patch in production and we haven't experienced any problem so far.

Unfortunately the patch didn't get enough attention and it was abandoned after bad reviews that didn't address why the patch was rejected. I can submit it again, but I'll need help from people able to review it.

Revision history for this message

sirkonst (sirkonst) wrote on 2013-09-23:

#13

DoS vulnerability must be fixed :-)

Revision history for this message

John Dickinson (notmyname) wrote on 2013-09-24:

#14

I've confirmed that this affects Python 2.7 and not Python2.6.

I would have preferred that this had been submitted as a security bug initially.

Changed in swift:
importance:	Undecided → Critical

Revision history for this message

John Dickinson (notmyname) wrote on 2013-09-24:

#15

This is not a memory leak issue. The issue is that the proxy server (under Python2.7) will read the entire contents of the object from the storage node. When those requests are concentrated on a particular proxy or object server, the CPU spikes and prevents other requests from being handled.

Revision history for this message

Juan J. Martínez (jjmartinez) wrote on 2013-09-24:

#16

Thanks John!

We are experiencing this issue with Debian Squeeze, so I can confirm it happens with 2.6.6 too; the change that introduced the bug was supposed to fix a leak on the storage nodes, but I can't confirm that.

I don't know if my proposed fix at https://review.openstack.org/38602 is OK, should I submit it again for review?

Revision history for this message

Samuel Merritt (torgomatic) wrote on 2013-09-24:

#17

Well, there's a couple bad things that can happen here, and Swift needs to avoid all of them.

Bad Thing 1: when a client disconnects, the proxy can leak memory. THIS DOES NOT HAPPEN RIGHT NOW. The reason for all this connection-draining code is that there are a number of buffers between the object server and the Python code. In particular, there are buffers in C-land that, if not drained, would simply hang around forever. Obviously that *shouldn't* happen, but it did.

Bad Thing 2: when a client disconnects, the proxy reads the rest of the object for no good reason. THIS IS HAPPENING NOW. It should be fixed, and people (myself included) are looking into it.

The reason the previous patch was rejected is that, while it fixed Bad Thing 2, it removed all the workarounds for Bad Thing 1. Any patch for this issue will have to be tested very well to ensure it doesn't cause regressions. This almost certainly means automated test scripts and patches to make it easy for reviewers to both exhibit the original Bad Things *and* for reviewers to verify that the proposed patch squashes both.

Revision history for this message

Juan J. Martínez (jjmartinez) wrote on 2013-09-24:

#18

I totally understand.

This bug was filled in April and it was causing a big problem in our swift deployment. Since I applied the patch in July, the memory leak either is not happening in our deployment or it is not significant.

I'll be more than happy to apply a better patch but I'm afraid I can't find a better solution myself without help.

Revision history for this message

Samuel Merritt (torgomatic) wrote on 2013-09-26:

#19

FWIW, I was wrong about the memory leak: it was actually leaking socket filehandles, not memory. (At least, not leaking memory *noticeably*.)

I think I've got a fix that I'll submit shortly and see what people think.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-09-26:

#20

Fix proposed to branch: master
Review: https://review.openstack.org/48538

Changed in swift:
assignee:	Juan J. Martínez (jjmartinez) → Samuel Merritt (torgomatic)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-10-07: Fix merged to swift (master)

#21

Reviewed: https://review.openstack.org/48538
Committed: http://github.com/openstack/swift/commit/def37fb56aea7b9fe4254621e10667712052d3ac
Submitter: Jenkins
Branch: master

commit def37fb56aea7b9fe4254621e10667712052d3ac
Author: Samuel Merritt <email address hidden>
Date: Wed Sep 25 10:41:41 2013 -0700

Stop reading from object server when client disconnects.

    If a client were in the middle of an object GET request and then
    disconnected, the proxy would wait a while (default 60s) and then time
    out the connection. As part of the teardown for this, the proxy would
    attempt to close the connection to the object server, then drain any
    associated buffers. However, this didn't work particularly well,
    resulting in the proxy reading the entire remainder of the object for
    no gain.

    Now, the proxy closes the connection hard, by calling .close() on the
    underlying socket._socket object. This is different from calling
    .close() on a socket._socketobject object, which is what you get back
    from socket.socket() and similar methods. Calling .close() on a
    socket._socketobject simply decrements a reference counter on the
    socket._socket, which has been observed in the past to result in
    socket leaks when something holds onto a reference. However, calling
    .close() on a socket._socket actually closes the socket regardless of
    who else has a reference to it.

    I had to delete a test assertion that said the object server never got
    SIGPIPE after a GET w/X-Newest. Well, you get a SIGPIPE when you write
    to a closed socket, and now the proxy is actually closing the sockets
    early, so now you *do* get a SIGPIPE.

closes-bug: 1174660

Note that this will cause a regression on bug 1037337; unfortunately,
the cure is worse than the disease, so out it goes.

Change-Id: I9c7a2e7fdb8b4232e53ea96f86b50e8d34c27221

Changed in swift:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2013-10-09

Changed in swift:
status:	Fix Committed → Fix Released

Revision history for this message

Lin Yun Fan (lin-yunfan) wrote on 2013-10-11:

#22

Hi
I know this may not the right place for the question but it is quite similar to this problem.

I am using swift 1.74.I have a program that generate thumbnail from swift object(video),the program use http range(n-) header to get the necessary part for the thumbnail.
I found it caused swift to have alot of established connection after some debug I found the problem was caused by the client close the connection before all the content is donwloaded.
If I download part of the object then close the connection swift won't release the connection immediately and if you do that for many times the swift could too busy to handle other request

Thierry Carrez (ttx) on 2013-10-17

Changed in swift:
milestone:	1.10.0-rc1 → 1.10.0
no longer affects:	swift/havana

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-10-27: Fix proposed to swift (feature/ec)

#23

Fix proposed to branch: feature/ec
Review: https://review.openstack.org/54029

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-10-29: Fix merged to swift (feature/ec)

#24

Download full text (24.9 KiB)

Reviewed: https://review.openstack.org/54029
Committed: http://github.com/openstack/swift/commit/94d3671b0bbf87fdbff845643963f3f9a97c58b5
Submitter: Jenkins
Branch: feature/ec

commit abcecd26a7b5871f75f0fbddf0d08bbac95bb089
Author: Kun Huang <email address hidden>
Date: Wed Oct 23 21:19:01 2013 +0800

utf8 encode tempurl key

    In tempurl middleware, hmac uses the value of account metadata to
    generate HMAC-SHA1 signature and hmac must accept a str-type string, not
    a unicode string. The meta dict returned from get_info stroges special
    chars as unicode however. So just encode it for tempurl using.

Closes-Bug: #1242644
Change-Id: I4be62eea014a573efc4748470de57dccf00e431d

commit cd2e7df0b69bbd269cd3c4170e0fee8186a07c95
Author: Pete Zaitcev <email address hidden>
Date: Tue Oct 22 17:18:04 2013 -0600

Add an __str__ method to brokers

    A few uses of broker.db_file are in printouts where we do need
    them, so the administrator may know what's up. Seems like an easy
    way to get rid of those is to make brokers identify themselves
    with common __str__. Alternative back-end implementations may
    supply something other than a filename here, for example a cluster
    name and a volume name.

Note that I'm not sure if correct coercion would occur when
brokers are bounced through dictionaries, hence explicit str().

Change-Id: I329788ebd1fbe39ffadcf9f9d5194a74a88dde58

commit 9807a358c6d1314d25e3a41da75be5851fa0ac27
Author: Donagh McCabe <email address hidden>
Date: Fri Aug 23 15:03:08 2013 +0100

Add WWW-Authenticate to 401 responses

    Per http://www.ietf.org/rfc/rfc2616.txt, when a 401 error is returned, the
    Www-Authenticate response header MUST also be returned. The format is
    described in http://www.ietf.org/rfc/rfc2617.txt.

    Swift supports and/or implements a number of authentication schemes
    including tempauth, Keystone, tempurl, formpost and container sync. In
    this fix, we use a catch-all, "Swift". The realm is the account (where
    known) or "unknown" (bad path or where the 401 is returned from code
    that does not have the request). Examples:

Www-Authenticate: Swift realm="AUTH_1234567889"
Www-Authenticate: Swift realm="unknown"

Fixes bug #1215491

Change-Id: I03362789318dfa156d3733ef9348795062a9cfc4

commit ed5101b2002b877518466ae5f9a6d652581238f2
Author: Yuan Zhou <email address hidden>
Date: Sat Oct 19 11:40:35 2013 +0800

Adding more unit tests for audit_location_generator

Change-Id: I40410fbbb79cea8647074f703e4675364c69d930

commit 5202b0e58613738cc81ec63e7c6da14ce5429526
Author: Peter Portante <email address hidden>
Date: Thu Sep 12 19:51:18 2013 -0400

DiskFile API, with reference implementation

    Refactor on-disk knowledge out of the object server by pushing the
    async update pickle creation to the new DiskFileManager class (name is
    not the best, so suggestions welcome), along with the REPLICATOR
    method logic. We also move the mount checking and thread pool storage
    to the new ondisk.Devices object, which then also becomes th...

Reviewed:  https://review.openstack.org/54029
Committed: http://github.com/openstack/swift/commit/94d3671b0bbf87fdbff845643963f3f9a97c58b5
Submitter: Jenkins
Branch:    feature/ec

commit abcecd26a7b5871f75f0fbddf0d08bbac95bb089
Author: Kun Huang <academicgareth@gmail.com>
Date:   Wed Oct 23 21:19:01 2013 +0800

utf8 encode tempurl key
    
    In tempurl middleware, hmac uses the value of account metadata to
    generate HMAC-SHA1 signature and hmac must accept a str-type string, not
    a unicode string. The meta dict returned from get_info stroges special
    chars as unicode however. So just encode it for tempurl using.
    
    Closes-Bug: #1242644
    Change-Id: I4be62eea014a573efc4748470de57dccf00e431d

commit cd2e7df0b69bbd269cd3c4170e0fee8186a07c95
Author: Pete Zaitcev <zaitcev@kotori.zaitcev.us>
Date:   Tue Oct 22 17:18:04 2013 -0600

Add an __str__ method to brokers
    
    A few uses of broker.db_file are in printouts where we do need
    them, so the administrator may know what's up. Seems like an easy
    way to get rid of those is to make brokers identify themselves
    with common __str__. Alternative back-end implementations may
    supply something other than a filename here, for example a cluster
    name and a volume name.
    
    Note that I'm not sure if correct coercion would occur when
    brokers are bounced through dictionaries, hence explicit str().
    
    Change-Id: I329788ebd1fbe39ffadcf9f9d5194a74a88dde58

commit 9807a358c6d1314d25e3a41da75be5851fa0ac27
Author: Donagh McCabe <donagh.mccabe@hp.com>
Date:   Fri Aug 23 15:03:08 2013 +0100

Add WWW-Authenticate to 401 responses
    
    Per http://www.ietf.org/rfc/rfc2616.txt, when a 401 error is returned, the
    Www-Authenticate response header MUST also be returned. The format is
    described in http://www.ietf.org/rfc/rfc2617.txt.
    
    Swift supports and/or implements a number of authentication schemes
    including tempauth, Keystone, tempurl, formpost and container sync. In
    this fix, we use a catch-all, "Swift". The realm is the account (where
    known) or "unknown" (bad path or where the 401 is returned from code
    that does not have the request). Examples:
    
         Www-Authenticate: Swift realm="AUTH_1234567889"
         Www-Authenticate: Swift realm="unknown"
    
    Fixes bug #1215491
    
    Change-Id: I03362789318dfa156d3733ef9348795062a9cfc4

commit ed5101b2002b877518466ae5f9a6d652581238f2
Author: Yuan Zhou <yuan.zhou@intel.com>
Date:   Sat Oct 19 11:40:35 2013 +0800

Adding more unit tests for audit_location_generator
    
    Change-Id: I40410fbbb79cea8647074f703e4675364c69d930

commit 5202b0e58613738cc81ec63e7c6da14ce5429526
Author: Peter Portante <peter.portante@redhat.com>
Date:   Thu Sep 12 19:51:18 2013 -0400

DiskFile API, with reference implementation
    
    Refactor on-disk knowledge out of the object server by pushing the
    async update pickle creation to the new DiskFileManager class (name is
    not the best, so suggestions welcome), along with the REPLICATOR
    method logic. We also move the mount checking and thread pool storage
    to the new ondisk.Devices object, which then also becomes the new home
    of the audit_location_generator method.
    
    For the object server, a new setup() method is now called at the end
    of the controller's construction, and the _diskfile() method has been
    renamed to get_diskfile(), to allow implementation specific behavior.
    
    We then hide the need for the REST API layer to know how and where
    quarantining needs to be performed. There are now two places it is
    checked internally, on open() where we verify the content-length,
    name, and x-timestamp metadata, and in the reader on close where the
    etag metadata is checked if the entire file was read.
    
    We add a reader class to allow implementations to isolate the WSGI
    handling code for that specific environment (it is used no-where else
    in the REST APIs). This simplifies the caller's code to just use a
    "with" statement once open to avoid multiple points where close needs
    to be called.
    
    For a full historical comparison, including the usage patterns see:
    https://gist.github.com/portante/5488238
    
    (as of master, 2b639f5, Merge
     "Fix 500 from account-quota     This Commit
     middleware")
    --------------------------------+------------------------------------
                                     DiskFileManager(conf)
    
                                       Methods:
                                         .pickle_async_update()
                                         .get_diskfile()
                                         .get_hashes()
    
                                       Attributes:
                                         .devices
                                         .logger
                                         .disk_chunk_size
                                         .keep_cache_size
                                         .bytes_per_sync
    
    DiskFile(a,c,o,keep_data_fp=)    DiskFile(a,c,o)
    
      Methods:                         Methods:
       *.__iter__()
        .close(verify_file=)
        .is_deleted()
        .is_expired()
        .quarantine()
        .get_data_file_size()
                                         .open()
                                         .read_metadata()
        .create()                        .create()
                                         .write_metadata()
        .delete()                        .delete()
    
      Attributes:                      Attributes:
        .quarantined_dir
        .keep_cache
        .metadata
                                    *DiskFileReader()
    
                                       Methods:
                                         .__iter__()
                                         .close()
    
                                       Attributes:
                                        +.was_quarantined
    
    DiskWriter()                     DiskFileWriter()
    
      Methods:                         Methods:
        .write()                         .write()
        .put()                           .put()
    
    * Note that the DiskFile class   * Note that the DiskReader() object
      implements all the methods       returned by the
      necessary for a WSGI app         DiskFileOpened.reader() method
      iterator                         implements all the methods
                                       necessary for a WSGI app iterator
    
                                     + Note that if the auditor is
                                       refactored to not use the DiskFile
                                       class, see
                                       https://review.openstack.org/44787
                                       then we don't need the
                                       was_quarantined attribute
    
    A reference "in-memory" object server implementation of a backend
    DiskFile class in swift/obj/mem_server.py and
    swift/obj/mem_diskfile.py.
    
    One can also reference
    https://github.com/portante/gluster-swift/commits/diskfile for the
    proposed integration with the gluster-swift code based on these
    changes.
    
    Change-Id: I44e153fdb405a5743e9c05349008f94136764916
    Signed-off-by: Peter Portante <peter.portante@redhat.com>

commit a4e25d6da8d007051b3b0575de47338e210d4315
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Mon Oct 14 13:12:22 2013 -0700

Proxy: use just one greenthread per client
    
    Instead of using two greenthreads and an eventlet.queue.Queue of size
    1 to stream response bodies from backends to clients, just do it with
    a single greenthread that reads and writes in a loop. This should
    lower the amount of CPU used by the proxy in its response streaming.
    
    Client fairness used to be provided implicitly; since the queue only
    held 1 item, the read-from-backend greenthread would block after each
    chunk until the write-to-client greenthread got a chance to run. Now
    fairness is provided by an explicit eventlet.sleep() call, just as it
    is in the object server.
    
    Change-Id: Iae27109f5a3d109ad21ec9a972e39f22150f6dbb

commit a0017e053734ea5e72bfcc3fdc630c8a7a25431a
Author: Greg Lange <greglange@gmail.com>
Date:   Tue Oct 15 16:23:00 2013 +0000

Removes vim comment
    
    Change-Id: Ibc930638ee2d839e8499a13018c205b615f8b854
    Partial-Bug: 1229324

commit d4c658750562f2baa978cc28d341680c6b470070
Author: Greg Lange <greglange@gmail.com>
Date:   Tue Oct 15 15:38:02 2013 +0000

return value on memcache client decr call
    
    Closes-Bug: 1238204
    Change-Id: I7d2ece118159637e5c5bf4bf5dd4ce4d2f6e4523

commit 9c226e3a1530a21d754ac7f1702c35f4a8e5927c
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Thu Oct 10 13:33:05 2013 -0700

Small optimization to container_quotas
    
    When handling an object PUT with X-Copy-From, we only need to fetch
    the source object's size if the container has a byte quota, but we
    were always fetching it. Now we only fetch the size when we need it.
    
    Change-Id: I0c2c6820cdf0022ef57df5fe7dcb2dd466665a4f

commit b61add12e4910208ec115966c259553c94671202
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Wed Oct 9 12:31:26 2013 -0700

Small efficiency improvement in account quotas
    
    If you were performing an object PUT with X-Copy-From and you had
    account_quotas in your pipeline but your account did not have a quota
    set, then we will now avoid an unnecessary call to get_object_info().
    
    Change-Id: I8592d4ad65e45ecf9e0557bc10ce2c2db19732e8

commit 2419df2730c5879092f0058e63a7dfbae6519fa0
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Wed Oct 9 12:25:42 2013 -0700

Fix 500 from account-quota middleware
    
    If a user had set X-Account-Meta-Quota-Bytes to something non-integer
    prior to the installation of the account-quota middleware, then the
    quota check would choke on it. Now a non-integer value is treated as
    "no quota".
    
    Change-Id: I5c38911be1f66fa293aea9c78590d4ce7d184113

commit 9b521fe800ed3984169d37e9772674867745fede
Author: John Dickinson <me@not.mn>
Date:   Tue Oct 8 07:35:43 2013 -0700

CHANGELOG and AUTHORS updates for 1.10.0 release
    
    Change-Id: Iaf8d289c5ddb6ac778243a5e1fc27600bade6feb

commit 221aa9edd33fab41c696431261cb3414f3c02355
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Tue Oct 8 13:34:50 2013 -0700

allow container create even if over account quota
    
    Change-Id: I07278bc3314c26426ff0557fe2f5c69c02d6550c

commit 841b8082cee923eb34f2b1bc89f0c5f4e91ce9b6
Author: Aaron Rosen <arosen@nicira.com>
Date:   Mon Oct 7 22:24:10 2013 -0700

Correct URL in readme
    
    doc.openstack.org -> docs.openstack.org
    
    Change-Id: Iacadb8f70cb94adc002c562a1068ff38be19a9fc

commit 0b370269111957fec7521d284fcbd742ff8b8c13
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Mon Oct 7 21:28:48 2013 -0700

Add a Timeout when getting Memcache connections
    
    The old Timeout behavior when pulling connections of the MemcacheConnPool left
    ambiguity around what timed out and could put more placeholders in the queue
    than the configured max_connections.
    
    To avoid waiting indefinitely on slow severs we raise a custom Timeout when we
    fail get a connection from the pool.  We still error limit the slow server,
    and move onto the next, but we still don't allow more than max_connections.
    
    Change-Id: I9e2409896423d52da69e35c038e5f457c71f705d

commit 6607beab0dc8043251b490471761fa2dd85f2816
Author: Peter Portante <peter.portante@redhat.com>
Date:   Fri Oct 4 08:04:42 2013 -0400

Don't apply timeout to Pool.get operation (leaks)
    
    The connection timeout to a memcache server is performed by using the
    "with Timeout()" construct over the sock.connect() call in the
    .create() method. In addition, the same construct was being applied to
    the Pool.get() call in ._get_conns().
    
    If the maximum number of connections was already created, and the
    Pool.get() called took longer than the connect timeout, then the error
    handling path would add a place holder to the connection
    pool. Eventlet's Pool class allows for additional items to be added to
    the pool, over and above the max_size setting. This additional place
    holder will eventually be pulled and a new connection created to take
    its place.
    
    The fix is to remove the timeout construct in the _get_conns() method.
    
    In addition, we also apply the unit test patch mentioned in the review
    comments for Patch Set 6 of https://review.openstack.org/45134,
    located at http://paste.openstack.org/show/47288/.
    
    Fixes bug 1235027
    
    Change-Id: I786cabefe3e8ddf7d92feb7ebc7cfb613d60a1da
    Signed-off-by: Peter Portante <peter.portante@redhat.com>

commit 9411a24ba7b5a1380ec8a2aa13d4aa92988d2ff9
Author: Peter Portante <peter.a.portante@gmail.com>
Date:   Mon Oct 7 12:10:31 2013 +0000

Revert "Refactor common/utils methods to common/ondisk"
    
    This reverts commit 7760f41c3ce436cb23b4b8425db3749a3da33d32
    
    Change-Id: I95e57a2563784a8cd5e995cc826afeac0eadbe62
    Signed-off-by: Peter Portante <peter.portante@redhat.com>

commit 61e135e211822840731d3153e9b249b6b81a1c21
Author: Peter Portante <peter.portante@redhat.com>
Date:   Sat Oct 5 11:27:35 2013 -0400

Fix bad hash_path reference
    
    Mea culpa: these two scripts were missed in commit:
    
      https://review.openstack.org/46956
    
    Fixes bug 1235441
    
    Change-Id: I4303bc808448a79bddbb991526b0cca26150b392
    Signed-off-by: Peter Portante <peter.portante@redhat.com>

commit c7e65db27da59ddf221d1720362434581ef30311
Author: Pete Zaitcev <zaitcev@kotori.zaitcev.us>
Date:   Fri Oct 4 23:18:07 2013 -0600

Make test_translations test our tree
    
    In order to run the correct classes, Python test framework adjusts
    sys.path. However, these changes are not propagated to subprocesses.
    Therefore, the test actually tries to test installed Swift, not
    the one in which it is running.
    
    The usual suggestion is to run "python setup.py develop" before
    testing, but it's annoying and error-prone. If you forget it,
    you may test the code in /usr very easily, and never know.
    
    Let's just pass the correct path to subprocess. Much safer.
    
    Change-Id: Ic71314e8462cf6e0579d704ffe9fbbfac7e6ba24

commit 3d8f0f18059482412e9b60909de69df90416dfdd
Author: Peter Portante <peter.portante@redhat.com>
Date:   Wed Oct 2 23:58:13 2013 -0400

Fedora 19: need to use /etc/rc.d/rc.local
    
    Change-Id: I80e9a4c40ff99ec09a8eeef935447c6393ea78ec
    Signed-off-by: Peter Portante <peter.portante@redhat.com>

commit db4547d01dab84142da40fdeb1ca5fbb34ef56ba
Author: Peter Portante <peter.portante@redhat.com>
Date:   Wed Oct 2 23:52:30 2013 -0400

Remove sphinx build warnings
    
    Change-Id: Ic34bbd9cc65d96ea9b8434be7b54e5bcfae28b63
    Signed-off-by: Peter Portante <peter.portante@redhat.com>

commit def37fb56aea7b9fe4254621e10667712052d3ac
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Wed Sep 25 10:41:41 2013 -0700

Stop reading from object server when client disconnects.
    
    If a client were in the middle of an object GET request and then
    disconnected, the proxy would wait a while (default 60s) and then time
    out the connection. As part of the teardown for this, the proxy would
    attempt to close the connection to the object server, then drain any
    associated buffers. However, this didn't work particularly well,
    resulting in the proxy reading the entire remainder of the object for
    no gain.
    
    Now, the proxy closes the connection hard, by calling .close() on the
    underlying socket._socket object. This is different from calling
    .close() on a socket._socketobject object, which is what you get back
    from socket.socket() and similar methods. Calling .close() on a
    socket._socketobject simply decrements a reference counter on the
    socket._socket, which has been observed in the past to result in
    socket leaks when something holds onto a reference. However, calling
    .close() on a socket._socket actually closes the socket regardless of
    who else has a reference to it.
    
    I had to delete a test assertion that said the object server never got
    SIGPIPE after a GET w/X-Newest. Well, you get a SIGPIPE when you write
    to a closed socket, and now the proxy is actually closing the sockets
    early, so now you *do* get a SIGPIPE.
    
    closes-bug: 1174660
    
    Note that this will cause a regression on bug 1037337; unfortunately,
    the cure is worse than the disease, so out it goes.
    
    Change-Id: I9c7a2e7fdb8b4232e53ea96f86b50e8d34c27221

commit 6b67952442429cff306ceefc83dd206a4a5c16ed
Author: Dirk Mueller <dirk@dmllr.de>
Date:   Wed Oct 2 21:27:40 2013 +0200

assertEquals is deprecated, use assertEqual
    
    The very same functionality for fewer keystrokes.
    
    Change-Id: I8c9513412f398132db5636fdfb7bf8301e8443cf

commit 02e247c1b3d9637c6d6ecfd47ce82d250c576dbb
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Wed Oct 2 11:39:35 2013 -0700

Add "note" box callouts to SAIO for user changes.
    
    The SAIO is purpously cut into two parts, so that you don't have to switch
    back and forth between root and your unprivledged user.  Add some "note" box
    callouts to highlight this changeover.
    
    Change-Id: I8b1a8f0539eac60d4121bdd4dab01df75ecca207

commit 062a67b07d9a16fd4970bb90798ea67d1f99b5cd
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Wed Oct 2 09:57:34 2013 -0700

Allow slightly older dnspython (>= 1.9.4).
    
    This has a couple benefits.
    
    First, it means Ubuntu Precise users can just install python-dnspython
    from packages instead of having to pull one in from source. This
    should also fix an install error with new SAIO boxes where running
    "python setup.py develop" fails unless a satisfactory dnspython is
    already installed.
    
    Second, it matches the dnspython dependency in the global
    requirements. This means that the gates are already running with this
    dnspython dependency, and it means our requirements.txt is one step
    closer to being a subset of the global requirements.txt, which is
    important to some people.
    
    Change-Id: I5d58f488e1e4c4139c9fb20d89f386cab1537e98

commit ae8470131ead095e3bf1c290bac866a5e6e29e79
Author: Chuck Thier <cthier@gmail.com>
Date:   Wed Sep 4 22:20:44 2013 +0000

Pool memcache connections
    
    This creates a pool to each memcache server so that connections will not
    grow without bound.  This also adds a proxy config
    "max_memcache_connections" which can control how many connections are
    available in the pool.
    
    A side effect of the change is that we had to change the memcache calls
    that used noreply, and instead wait for the result of the request.
    Leaving with noreply could cause a race condition (specifically in
    account auto create), due to one request calling `memcache.del(key)` and
    then `memcache.get(key)` with a different pooled connection.  If the
    delete didn't complete fast enough, the get would return the old value
    before it was deleted, and thus believe that the account was not
    autocreated.
    
    ClaysMindExploded
    DocImpact
    Change-Id: I350720b7bba29e1453894d3d4105ac1ea232595b

commit e8a07c4ca784f958e33044c267212925d6452979
Author: Peter Portante <peter.portante@redhat.com>
Date:   Thu Sep 26 15:47:36 2013 -0400

Fedora 19 updates
    
    Change-Id: I95138852e45aa7632218a7107e0e7ba1f6ef373c
    Signed-off-by: Peter Portante <peter.portante@redhat.com>

commit f9c4eb806f14ce095554d70261504781dbb7aa3e
Author: Brian D. Burns <iosctr@gmail.com>
Date:   Sun Sep 29 15:22:05 2013 -0400

Simplify object path when reporting SLO put errors
    
    When reporting errors for SLO PUT requests, use the object path from the
    manifest without the '/vrs/account' prefix. This is a continuation of
    the same changes made for SLO/bulk delete requests in 6768d5b.
    
    Change-Id: I40c90cccc1b7b5303d9f2b084dccb3be4f4448d8

commit 8ab06eb8cc08a6f5905efde5b07cee3e71805a77
Author: anc <alistair.coles@hp.com>
Date:   Fri Sep 27 10:10:55 2013 +0100

Set path_info in sub_slo manifest GET requests
    
    The new_req Request object created for fetching
    nested sub_slo manifests does not have its
    path_info set to the sub_slo path before being
    passed to GETorHEAD_base.
    
    The sub manifest GET works fine because
    GETorHEAD_base uses its path argument
    (not req.path) to set up its connection,
    BUT if an exception is thrown,
    GETorHEAD_base uses req.path to construct
    the log message - so the exception log would
    have the wrong path.
    
    Fixes bug 1231872
    
    Change-Id: I0f0f969958f814d8a7053440d9de4e50796169e4

commit 180cd18da64b59c39ec1696e4316b18711d1fb82
Author: Brian Curtin <brian.curtin@rackspace.com>
Date:   Tue Sep 24 15:07:59 2013 -0500

Move location of HTTPException in swift_testing
    
    While doing some work around swiftclient Python 3 support, httplib
    imports were changed to use six.moves in order to access httplib in Py2
    and http.client in Py3. Tests related to those changes are failing
    because these functional tests are depending on HTTPException being
    a name in swiftclient.client. These tests should acces that name from
    its canonical location in httplib (or, eventually, a Py3 friendly
    location like six.moves).
    
    Change-Id: I6335d465574045daedab47e5fe23c415d171a83a

commit fa308d60bded8b730dda451321bb7baf5ded2aa8
Author: anc <alistair.coles@hp.com>
Date:   Fri Sep 20 15:08:12 2013 +0100

Fix utf-8 handling in object versions.
    
    Fixes object versioning when object name and/or version
    container name contain multibyte utf-8 characters.
    
    When object names containing non-ASCII characters
    are PUT multiple times into a container with an
    x-versions-location set, subsequent DELETE of the
    object results in a 500 response status.
    
    When the versions container name contains
    non-ASCII characters the first delete of an object
    succeeds but fails to restore previous version of
    object, so second delete incorrectly returns 404.
    
    Fixes bug 1229142
    
    Change-Id: I425440f76b8328f8e119d390bfa4c7022181e89e

commit 4e8b2ffc2e2b97f8bd223c90848ecaaf777bebc9
Author: Peter Portante <peter.portante@redhat.com>
Date:   Mon Sep 23 22:16:59 2013 -0400

Use created container in unit test
    
    Change-Id: I2573be1ac14f65b8008611edf940363b31c8d86e
    Signed-off-by: Peter Portante <peter.portante@redhat.com>

commit d5fcc9aaf3b6707b1ddbc577e9f8c93fa90d6d89
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Mon Sep 23 10:05:34 2013 -0700

Two small account-quota fixes
    
    First: even if a user has exceeded their account quota, they should be
    able to make object POST requests. Updating an object's metadata isn't
    going to make them any more over quota, so should be allowed.
    
    Second: don't bother with the reseller_admin check for container or
    object requests. If I send the header X-Account-Meta-Quota-Bytes: 100
    on e.g. an object PUT request, the proxy will (rightly) ignore it. Now
    account-quotas does too.
    
    Change-Id: I970a76349659acdd8229a324bd33bfe7fe7261a4

commit 56440eb95da79506cc27d92e07f0f5969cc683ce
Author: Christian Schwede <info@cschwede.de>
Date:   Wed Sep 18 19:49:32 2013 +0200

Handle X-Copy-From header in container_quota middleware
    
    Content length of the copied object is checked before
    allowing the copy request according to the container quota.
    
    Closes-Bug: #1201875
    Change-Id: If44b916791e94ac6c66eee04a5727186ce0e56ae

commit f72704fc82803254a6f335fa2a7d2459f86f877d
Author: ZhiQiang Fan <aji.zqfan@gmail.com>
Date:   Fri Sep 20 01:00:54 2013 +0800

Change OpenStack LLC to Foundation
    
    Change-Id: I7c3df47c31759dbeb3105f8883e2688ada848d58
    Closes-bug: #1214176

commit b0aeed1ec7fa2fd315f6293b0fd3d8118e20c7f9
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Mon Sep 16 17:06:09 2013 -0700

Fix default replication options for ring-builder add
    
    Change-Id: I957deeb0e711bfe7cd9d852726c77179a4613ee0

commit 42f4b150e3fcd8c49ecac2e42839449d00190053
Author: Florian Hines <syn@ronin.io>
Date:   Thu Sep 5 18:12:15 2013 -0500

Faster swift-dispersion-populate
    
    - Makes swift-dispersion-populate a bit faster when using a larger
      dispersion_coverage with a larger part_power.
    - Adds option to only run population for container OR objects
    - Adds option to let you resume population at given point (useful if you
      need to resume population after a previous run error'd out or the
      like) by specifying which suffix to start at.
    
    The original populate just randomly used uuid4().hex as a suffix on the
    container/object names until all the partition's required where covered.
    This isn't a big deal if you're only doing 1% coverage on a ring with a
    small part power but takes ages if you're doing 100% on a larger ring.
    
    Change-Id: I52f890a774412c1d6179f12db9081aedc58b6bc2

Revision history for this message

Jeremy Stanley (fungi) wrote on 2013-11-20:

#25

In response to John's comment #14, there was apparently a separate security bug filed shortly before this one which seems to probably be a duplicate. I've opened and marked it as such, but do you think we need to consider retroactively issuing a security advisory about the issue?

Changed in ossa:
status:	New → Incomplete

Revision history for this message

Thierry Carrez (ttx) wrote on 2014-01-10:

#26

Given the versions affected and how much time has passed since this made it in, I'd consider this a performance improvement (making DoS attacks less efficient) and not issue a retroactive advisory about it...

Revision history for this message

Jeremy Stanley (fungi) wrote on 2014-01-11:

#27

Agreed, according to bug 1166198 the introduction and resolution timeline suggests this only affected folsom and grizzly.

Changed in ossa:
status:	Incomplete → Invalid

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	Fix Released	Critical	Samuel Merritt	OpenStack Object Storage (swift) 1.10.0 "havana"
	OpenStack Security Advisory	Invalid	Undecided	Unassigned

OpenStack Object Storage (swift)

when client disconnected, garbage collecting is too heavy

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches