Slowdown on PUT with write-affinity on and zero-weight zone

Bug #1534303 reported by Samuel Merritt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Medium
Samuel Merritt

Bug Description

If you have a region that has only zero-weight devices in it and you try to use write affinity, you see a long object PUT latency.

This is due to Ring.get_more_nodes() spending a lot of time looking for a handoff in that empty region, but since all the devices have zero weight, they have zero partitions assigned, so there are no handoffs in that region.

CVE References

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/267273
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=3c0cf549f1e822cce8f905b069b317e676cf306b
Submitter: Jenkins
Branch: master

commit 3c0cf549f1e822cce8f905b069b317e676cf306b
Author: Samuel Merritt <email address hidden>
Date: Wed Jan 13 18:08:45 2016 -0800

    Speed up get_more_nodes() when there is an empty zone

    The ring has some optimizations in get_more_nodes() so that it can
    find handoffs that span all the regions/zones/et cetera and then stop
    looking. The stopping is the important part.

    Previously, it would quickly find a handoff in each unused region,
    then spend way too long looking for more unused regions; the same was
    true for zones, IPs, and so on. Thus, in commit 9cd7c6c, we started
    counting regions and zones, then stopping when we found them all.

    This count included all regions and zones in the ring, regardless of
    whether or not there were actually any parts assigned or not. In rings
    with an empty region, i.e. a region for which there are only
    zero-weight devices, get_more_nodes() would be very slow.

    This commit ignores devices with no assigned partitions when counting
    regions, zones, and so forth, thus greatly speeding things up.

    The output of get_more_nodes() is unchanged. This is purely an
    optimization.

    Closes-Bug: 1534303

    Change-Id: I4a5c57205e87e1205d40fd5d9458d4114e524332

Changed in swift:
status: In Progress → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/swift 2.6.0

This issue was fixed in the openstack/swift 2.6.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/crypto)

Fix proposed to branch: feature/crypto
Review: https://review.openstack.org/272201

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/crypto)
Download full text (30.2 KiB)

Reviewed: https://review.openstack.org/272201
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=f9b7fd3074b5b0e5d6ea879d4144f7bfeec5d46b
Submitter: Jenkins
Branch: feature/crypto

commit e13a03c379273ee10e678818078b9c40a96a7dc9
Author: Tim Burke <email address hidden>
Date: Wed Jan 20 16:06:26 2016 -0800

    Stop overriding builtin range

    Change-Id: I315f8b554bb9e96659b455f4158f074961bd6498

commit 0a404def7d54d1ef1c85c11a378052260c4fda4c
Author: John Dickinson <email address hidden>
Date: Wed Jan 20 15:19:35 2016 -0800

    remove unneeded duplicate dict keys

    Change-Id: I926d7aaa9df093418aaae54fe26e8f7bc8210645

commit 221f94fdd39fd2dcd9a2e5565adceab615d55913
Author: John Dickinson <email address hidden>
Date: Tue Jan 19 14:50:24 2016 -0800

    authors and changelog updates for 2.6.0

    Change-Id: Idd0ff9e70abc0773be183c37cd6125fe852da7c0

commit 58359269b0e971e52f0eb7f97221566ca2148014
Author: Samuel Merritt <email address hidden>
Date: Tue Dec 8 16:36:05 2015 -0800

    Fix memory/socket leak in proxy on truncated SLO/DLO GET

    When a client disconnected while consuming an SLO or DLO GET response,
    the proxy would leak a socket. This could be observed via strace as a
    socket that had shutdown() called on it, but was never closed. It
    could also be observed by counting entries in /proc/<pid>/fd, where
    <pid> is the pid of a proxy server worker process.

    This is due to a memory leak in SegmentedIterable. A SegmentedIterable
    has an 'app_iter' attribute, which is a generator. That generator
    references 'self' (the SegmentedIterable object). This creates a
    cyclic reference: the generator refers to the SegmentedIterable, and
    the SegmentedIterable refers to the generator.

    Python can normally handle cyclic garbage; reference counting won't
    reclaim it, but the garbage collector will. However, objects with
    finalizers will stop the garbage collector from collecting them* and
    the cycle of which they are part.

    For most objects, "has finalizer" is synonymous with "has a __del__
    method". However, a generator has a finalizer once it's started
    running and before it finishes: basically, while it has stack frames
    associated with it**.

    When a client disconnects mid-stream, we get a memory leak. We have
    our SegmentedIterable object (call it "si"), and its associated
    generator. si.app_iter is the generator, and the generator closes over
    si, so we have a cycle; and the generator has started but not yet
    finished, so the generator needs finalization; hence, the garbage
    collector won't ever clean it up.

    The socket leak comes in because the generator *also* refers to the
    request's WSGI environment, which contains wsgi.input, which
    ultimately refers to a _socket object from the standard
    library. Python's _socket objects only close their underlying file
    descriptor when their reference counts fall to 0***.

    This commit makes SegmentedIterable.close() call
    self.app_iter.close(), thereby unwinding its generator's stack and
    making it eligible for garbage collection.

    * in Python < 3...

tags: added: in-feature-crypto
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/crypto)

Fix proposed to branch: feature/crypto
Review: https://review.openstack.org/277950

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (feature/crypto)

Change abandoned by Alistair Coles (<email address hidden>) on branch: feature/crypto
Review: https://review.openstack.org/277950

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/hummingbird)

Fix proposed to branch: feature/hummingbird
Review: https://review.openstack.org/290148

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/hummingbird)
Download full text (71.7 KiB)

Reviewed: https://review.openstack.org/290148
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=0f7f1de233919a0b046349a3e31ae7fc8675a1c5
Submitter: Jenkins
Branch: feature/hummingbird

commit d6b4587a554b51ba733b151e0d924735b63d07e0
Author: Olga Saprycheva <email address hidden>
Date: Tue Mar 8 10:57:56 2016 -0600

    Removed redundant file for flake8 check

    Change-Id: I4322978aa20ee731391f7709bbd79dee140fc703

commit 643dbce134140530eef2ae62c42fef1107f905ed
Author: OpenStack Proposal Bot <email address hidden>
Date: Tue Mar 8 06:35:49 2016 +0000

    Imported Translations from Zanata

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I96b8ff1287bf219c5f8d56a3a4868c1063a953f9

commit 83713d37f0331c5ce9d377f4b4e8724551ae30ca
Author: Daisuke Morita <email address hidden>
Date: Mon Mar 7 18:30:47 2016 -0800

    Missing comments for storage policy parameter

    There are missing comments about storege_policy_index so appropriate
    comments are added.

    Change-Id: I3de3f0e6864e65918ca1a13cce70f19c23d295f5

commit 2cff2dec3d1c4588f5103e39679c43b3dded6dcb
Author: Olga Saprycheva <email address hidden>
Date: Fri Mar 4 15:19:39 2016 -0600

    Fixed pep8 and flake8 errors in doc/source/conf.py and updated flake8 commands in tox.ini to test it.

    Change-Id: I2add370e4cfb55d1388e3a8b41f688a7f3f2c621

commit 043fbca6d08648baa314ea2236f1ccdca8785f16
Author: Christian Schwede <email address hidden>
Date: Fri Mar 4 09:33:17 2016 +0000

    Remove Erasure Coding beta status from docs

    This removes notes stating support for Erasure coding as beta. Questions
    regarding the stability of EC are coming up regularly, and are often referring
    to the docs that state EC as still in beta.

    Besides this, a note marking statsd support as beta has been removed as well.

    Change-Id: If4fb6a5c4cb741d42953db3cee8cb17a1d774e15

commit 09c73b86e9255f28fbd4cf571a52c17d549a8f9a
Author: Pete Zaitcev <email address hidden>
Date: Thu Mar 3 10:24:28 2016 -0700

    Fix a crash in exception printout

    Says the number of arguments does not match the number of '%'.

    Change-Id: I8b5e395a07328fb9d4ac7a19f8ed2ae1637bee3b

commit fad5fabe0a22e8a86635a66523dd3d3d3b1fa705
Author: Tim Burke <email address hidden>
Date: Thu Mar 3 15:07:08 2016 +0000

    During functional tests, 404 response to a DELETE is successful

    Previously, we would only consider 204 responses successful, which would
    cause some spurious gate failures, such as

    http://logs.openstack.org/66/287666/3/check/gate-swift-dsvm-functional/c6d2673/console.html#_2016-03-03_13_41_07_846

    Change-Id: Ic8c300647924352a297a2781b50064f7657038b4

commit e91de49d6864b3794f8dc5acd9c1bf0c2f7409d1
Author: Alistair Coles <email address hidden>
Date: Mon Aug 10 10:30:10 2015 -0500

    Update container on fast-POST

    This patch makes a number of changes to enable content-type
    metadata to be updated when using the fast-POST mode of
    operation, as proposed in the associated spec ...

tags: added: in-feature-hummingbird
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.