OpenStack Object Storage (swift)

auditor error when missing data file

Bug #1260132 reported by clayg on 2013-12-12

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	Confirmed	Low	Unassigned

Bug Description

so I have a directory that looks like this:

clayg@swift:~$ ls /srv/node1/sdb1/objects/736/ce8/b80cbd53563425737e12108c6b497ce8/
1386805231.06924.meta

The object-server just 500's:

Dec 11 16:46:21 swift object-server: ERROR __call__ error with HEAD /sdb4/736/AUTH_test/test/asdf : #012Traceback (most recent call last):#012 File "/mnt/workspace/swift/swift/obj/server.py", line 661, in __call__#012 res = method(req)#012 File "/mnt/workspace/swift/swift/common/utils.py", line 2012, in wrapped#012 return func(*a, **kw)#012 File "/mnt/workspace/swift/swift/common/utils.py", line 760, in _timing_stats#012 resp = func(ctrl, *args, **kwargs)#012 File "/mnt/workspace/swift/swift/obj/server.py", line 540, in HEAD#012 metadata = disk_file.read_metadata()#012 File "/mnt/workspace/swift/swift/obj/diskfile.py", line 1258, in read_metadata#012 with self.open():#012 File "/mnt/workspace/swift/swift/obj/diskfile.py", line 973, in open#012 data_file, meta_file, ts_file = self._get_ondisk_file()#012 File "/mnt/workspace/swift/swift/obj/diskfile.py", line 1088, in _get_ondisk_file#012 " %s, meta_file: %s, ts_file: %s" % (data_file, meta_file, ts_file)#012AssertionError: On-disk file search algorithm contract is broken: data_file: None, meta_file: /srv/node4/sdb4/objects/736/ce8/b80cbd53563425737e12108c6b497ce8/1386805246.55842.meta, ts_file: None (txn: tx5171b4f05f244420824ea-0052a9075d)
Dec 11 16:46:21 swift object-server: 127.0.0.1 - - [12/Dec/2013:00:46:21 +0000] "HEAD /sdb4/736/AUTH_test/test/asdf" 500 1046 "HEAD http://localhost:8080/v1/AUTH_test/test/asdf" "tx5171b4f05f244420824ea-0052a9075d" "proxy-server 1794" 0.0021
Dec 11 16:46:21 swift proxy-server: ERROR 500 From Object Server 127.0.0.1:6040/sdb4 (txn: tx5171b4f05f244420824ea-0052a9075d)

My auditor is also whining about AssertionError:

object-auditor: ERROR Trying to audit /srv/node1/sdb1/objects/736/ce8/b80cbd53563425737e12108c6b497ce8: #012Traceback (most recent call last):#012 File "/mnt/workspace/swift/swift/obj/auditor.py", line 155, in failsafe_object_audit#012 self.object_audit(location)#012 File "/mnt/workspace/swift/swift/obj/auditor.py", line 173, in object_audit#012 with df.open():#012 File "/mnt/workspace/swift/swift/obj/diskfile.py", line 973, in open#012 data_file, meta_file, ts_file = self._get_ondisk_file()#012 File "/mnt/workspace/swift/swift/obj/diskfile.py", line 1088, in _get_ondisk_file#012 " %s, meta_file: %s, ts_file: %s" % (data_file, meta_file, ts_file)#012AssertionError: On-disk file search algorithm contract is broken: data_file: None, meta_file: /srv/node1/sdb1/objects/736/ce8/b80cbd53563425737e12108c6b497ce8/1386805231.06924.meta, ts_file: None

ssync replicator too:

object-replicator: 127.0.0.1:6010/sdb1/736 Unexpected response: ":ERROR: 0 'On-disk file search algorithm contract is broken: data_file: None, meta_file: /srv/node1/sdb1/objects/736/ce8/b80cbd53563425737e12108c6b497ce8/1386805231.06924.meta, ts_file: None'"

Seems like in that condition someone could step up to the table and quarantine that bad-boy or push in a datafile under it or something. e.g. rsync seems to square it as long as it noticed the hashes.pkl is out of date...

Revision history for this message

Samuel Merritt (torgomatic) wrote on 2013-12-12:

Yeah, quarantine is probably the way to go here, then a 404 from the object server (or whatever it usually does on quarantine).

I guess you could get here if an object server is unlinking stuff and dies halfway through...?

Revision history for this message

Peter Portante (peter-a-portante) wrote on 2013-12-12:

I think we have to find out how we got into that state. If we don't know, then we run the risk of quarantining the objects out of existence by other logic errors in the code.

Any time a .data gets removed in the system, a .ts file should be placed first, and then all the .meta and .data files can be removed.

The other problem we face is old listdir() results. It is quite possible that if we detect that condition, we should re-read the directory and try again since a quarantine operation could be performed on good data, just stale directory reads.

Revision history for this message

Peter Portante (peter-a-portante) wrote on 2013-12-12:

FWIW: added some tests to show current behavior of hash_cleanup_listdir(): https://gist.github.com/portante/7923353

Revision history for this message

gholt (gholt) wrote on 2013-12-12:

A system could get into that state just due to file system corruption (power fault, bit error, who knows?).

I don't think writing a .ts before deleting stuff and then deleting the .ts or whatever is truly the solution. You can still end up with corruption that leaves you with just a .meta, especially with all the alternate file systems folks are wanting to use. Best to handle the situation.

Quarantining seems the right thing to do, in my opinion. If this node is the only one with the .meta then yes there is a risk that the .meta was good and that the node just lost the .data. But that would mean one node lost the .data and the other nodes lost the .meta which seems pretty unlikely. More likely would be that the node with just the .meta brought it back to life due to some corruption and that it shouldn't be there. But, I guess the point is, the state is not knowable so quarantine makes sense (to me).

The hash_cleanup_listdir should not ever leave such a state since it deletes superseded files only, or a .ts that stands alone and is older than the reclaim age. But again, with file system corruption, anything is possible, including directories where there should be files, bizzare names, perfect names in seemingly weird orders, etc.

Revision history for this message

clayg (clay-gerrard) wrote on 2013-12-12:

fwiw this wasn't a logic error, i simulated file system corruption or whatever you want to call it by deleting the .data file.

I was testing the https://review.openstack.org/#/c/61006/ and just wanted to see what happened.

So no worry about quarantine hiding some other issue. Also quarantine just moves stuff out of the way for inspection yeah - so even if we quarantined all three copies we'd have some chance to look into what happened, or at least the state of things when we quarantined.

Revision history for this message

Peter Portante (peter-a-portante) wrote on 2013-12-12:

Okay, great to hear. I have a patch to try to address it. Just formulating some tests for it.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-09: Fix merged to swift (master)

Reviewed: https://review.openstack.org/61822
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=bc0807b3c58a44b81c54fe016497545c54971d92
Submitter: Jenkins
Branch: master

commit bc0807b3c58a44b81c54fe016497545c54971d92
Author: Peter Portante <email address hidden>
Date: Wed Dec 11 20:41:34 2013 -0500

Refactor to share on-disk layout with hash cleanup

Closes-Bug: 1260132
Change-Id: Iaa367c686b8dc49dd53c55a7cca661d9611044f8

Changed in swift:
status:	New → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-14: Fix proposed to swift (feature/ec)

Fix proposed to branch: feature/ec
Review: https://review.openstack.org/66462

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-14: Fix merged to swift (feature/ec)

Download full text (23.8 KiB)

Reviewed: https://review.openstack.org/66462
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=3895441afd1f8ca49a09a483f402a961009a8661
Submitter: Jenkins
Branch: feature/ec

commit bad52f11218a11978d1efb0832f164a60a363cc2
Author: Clay Gerrard <email address hidden>
Date: Fri Jan 10 00:31:55 2014 -0800

Allow programmatic reloading of Swift hash path config

    New util's function validate_hash_conf allows you to programmatically reload
    swift.conf and hash path global vars HASH_PATH_SUFFIX and HASH_PATH_PREFIX
    when they are invalid.

    When you load swift.common.utils before you have a swift.conf there's no good
    way to force a re-read of swift.conf and repopulate the hash path config
    options - short of restarting the process or reloading the module - both of
    which are hard to unittest. This should be no worse in general and in some
    cases easier.

Change-Id: I1ff22c5647f127f65589762b3026f82c9f9401c1

commit 7b9c283203479cb9916951e1ce1f466f197dea36
Author: Samuel Merritt <email address hidden>
Date: Fri Jan 10 12:57:53 2014 -0800

Add missing license header to test file

All the other tests have license headers, so this one should too.

I picked 2013 for the copyright year because that's when "git log"
says it was first and last touched.

Change-Id: Idd41a179322a3383f6992e72d8ba3ecaabd05c47

commit 47fcc5fca2c5020b69f3c2c7f0a8032f6c77354a
Author: Christian Schwede <email address hidden>
Date: Fri Jan 10 07:14:43 2014 +0000

Update account quota doc

    A note was added stating that the same limitations apply to
    account quotas as for container quotas. An example on uploads
    without a content-length headers was added.

Related-Bug: 1267659
Change-Id: Ic29b527cb71bf5903c2823844a1cf685ab6813dd

commit 6426f762d0d87063f9813630c620d880a4191046
Author: Peter Portante <email address hidden>
Date: Mon Dec 9 20:52:58 2013 -0500

Raise diskfile.py module coverage to > 98%

    We attempt to get the code coverage (with branch coverage) to 100%,
    but fall short because due to interactions between coverage.py and
    CPython's peephole optimizer. See:

https://bitbucket.org/ned/coveragepy/issue/198/continue-marked-as-not-covered

    In the main diskfile module, we remove the check for a valid
    "self._tmppath" since it is only one of a number of fields that could
    be verified and it was not worth trying to get coverage for it. We
    also remove the try / except around the close() method call in the
    DiskFileReader's app_iter_ranges() method since it will never be
    called in a context that will raise a quarantine exception (by
    definition ranges can't generate a quarantine event).

We also:

    * fix where quarantine messages are checked to ensure the
      generator is actually executed before the check
    * in new and modified tests:
      * use assertTrue in place of assert_
      * use assertEqual in place of assertEquals
    * fix references to the reserved word "object"

Change-Id: I6379be04adfc5012cb0b91748fb3ba3f11200b48

commit 5196eae...

Reviewed:  https://review.openstack.org/66462
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=3895441afd1f8ca49a09a483f402a961009a8661
Submitter: Jenkins
Branch:    feature/ec

commit bad52f11218a11978d1efb0832f164a60a363cc2
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Fri Jan 10 00:31:55 2014 -0800

Allow programmatic reloading of Swift hash path config
    
    New util's function validate_hash_conf allows you to programmatically reload
    swift.conf and hash path global vars HASH_PATH_SUFFIX and HASH_PATH_PREFIX
    when they are invalid.
    
    When you load swift.common.utils before you have a swift.conf there's no good
    way to force a re-read of swift.conf and repopulate the hash path config
    options - short of restarting the process or reloading the module - both of
    which are hard to unittest.  This should be no worse in general and in some
    cases easier.
    
    Change-Id: I1ff22c5647f127f65589762b3026f82c9f9401c1

commit 7b9c283203479cb9916951e1ce1f466f197dea36
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Fri Jan 10 12:57:53 2014 -0800

Add missing license header to test file
    
    All the other tests have license headers, so this one should too.
    
    I picked 2013 for the copyright year because that's when "git log"
    says it was first and last touched.
    
    Change-Id: Idd41a179322a3383f6992e72d8ba3ecaabd05c47

commit 47fcc5fca2c5020b69f3c2c7f0a8032f6c77354a
Author: Christian Schwede <christian.schwede@enovance.com>
Date:   Fri Jan 10 07:14:43 2014 +0000

Update account quota doc
    
    A note was added stating that the same limitations apply to
    account quotas as for container quotas. An example on uploads
    without a content-length headers was added.
    
    Related-Bug: 1267659
    Change-Id: Ic29b527cb71bf5903c2823844a1cf685ab6813dd

commit 6426f762d0d87063f9813630c620d880a4191046
Author: Peter Portante <peter.portante@redhat.com>
Date:   Mon Dec 9 20:52:58 2013 -0500

Raise diskfile.py module coverage to > 98%
    
    We attempt to get the code coverage (with branch coverage) to 100%,
    but fall short because due to interactions between coverage.py and
    CPython's peephole optimizer. See:
    
        https://bitbucket.org/ned/coveragepy/issue/198/continue-marked-as-not-covered
    
    In the main diskfile module, we remove the check for a valid
    "self._tmppath" since it is only one of a number of fields that could
    be verified and it was not worth trying to get coverage for it. We
    also remove the try / except around the close() method call in the
    DiskFileReader's app_iter_ranges() method since it will never be
    called in a context that will raise a quarantine exception (by
    definition ranges can't generate a quarantine event).
    
    We also:
    
    * fix where quarantine messages are checked to ensure the
      generator is actually executed before the check
    * in new and modified tests:
      * use assertTrue in place of assert_
      * use assertEqual in place of assertEquals
    * fix references to the reserved word "object"
    
    Change-Id: I6379be04adfc5012cb0b91748fb3ba3f11200b48

commit 5196eae0f1a91f7d2d0ddbd2136251bd714e9723
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Wed Jan 8 17:20:14 2014 -0800

Warn if read_affinity is configured but not enabled
    
    To get the proxy's read affinity to work, you have to set both
    "read_affinity = <stuff>" and "sorting_method = affinity" in the proxy
    config. If you set the first but not the second, then you don't get
    read affinity, and Swift doesn't help you determine why not.
    
    Now the proxy will emit a warning message if read_affinity is set but
    sorting_method is a value other than "affinity", so if you check your
    logs to see why it isn't working, you'll get a hint.
    
    Note that the message comes out twice per proxy process, so with 2
    workers you'll see the warning 6 times on startup (2 for master + 2 *
    2 per worker). It's sort of annoying, but at least it's not
    per-request.
    
    Bonus docstring fix: remove a sentence that's not true
    
    Change-Id: Iad37d4979a1b7c45c0e3d1b83336dbcf7a68a0c9

commit 3b166e6ba608194c4c7f1ced4dc579e26713ad18
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Wed Jan 8 14:14:02 2014 -0800

Log when quarantining an object
    
    When you're trying to troubleshoot why all your objects are getting
    quarantined, it's really nice to have some logging. If nothing else,
    it's nice to know which process did it.
    
    Change-Id: I6e8be6df938659f7392891df9336ed70bd155706

commit a3f095bc65e2ce70c4fee9b971690ae340248310
Author: Fabien Boucher <fabien.boucher@enovance.com>
Date:   Fri Jan 3 15:39:20 2014 +0100

Add Swift-account-stats as a related project.
    
    Change-Id: If2620a6b448581697d0d049578ea28faead6fbcf

commit 65a03e55cde262e52b2266b2e35f533e145a5f60
Author: Steve Kowalik <steven@wedontsleep.org>
Date:   Tue Jan 7 14:18:31 2014 +0800

Move the tests from functionalnosetests
    
    Move the tests from functionalnosetests under functional, so we no
    longer have two seperate trees for functional tests. This also drops
    the 'nose' name from the directory, so that it doesn't end up with
    confusion if we move to testr. Further, since there are no longer two
    test runs in .functests, it nows looks very close to the other two.
    
    Change-Id: I8de025c29d71f05072e257df24899927b82c1382

commit a708295d82dac3e0a38ffe8f8a9816c4559305c5
Author: Peter Portante <peter.portante@redhat.com>
Date:   Mon Jan 6 18:12:42 2014 -0500

Remove trailing slash for consistency
    
    Change-Id: Idd4fd116b6be226e46e33f421883b6fb34947a84
    Signed-off-by: Peter Portante <peter.portante@redhat.com>

commit 6164fa246d4c8d2613b44751e3f08330694d1497
Author: anc <alistair.coles@hp.com>
Date:   Tue Dec 3 22:02:39 2013 +0000

Generic means for persisting system metadata.
    
    Middleware or core features may need to store metadata
    against accounts or containers. This patch adds a
    generic mechanism for system metadata to be persisted
    in backend databases, without polluting the user
    metadata namespace, by using the reserved header
    namespace x-<server_type>-sysmeta-*.
    
    Modifications are firstly that backend servers persist
    system metadata headers alongside user metadata and
    other system state.
    
    For accounts and containers, system metadata in PUT
    and POST requests is treated in a similar way to user
    metadata. System metadata is not yet supported for
    object requests.
    
    Secondly, changes in the proxy controllers ensure that
    headers in the system metadata namespace will pass through
    in requests to backend servers.
    
    Thirdly, system metadata returned from backend servers
    in GET or HEAD responses is added to the cached info
    dict, which middleware can access.
    
    Finally, a gatekeeper middleware module is provided
    which filters all system metadata headers from requests
    and responses by removing headers with names starting
    x-account-sysmeta-, x-container-sysmeta-. The gatekeeper
    also removes headers starting x-object-sysmeta- in
    anticipation of future support for system metadata being
    set for objects. This prevents clients from writing or
    reading system metadata.
    
    The required_filters list in swift/proxy/server.py is
    modified to include the gatekeeper middleware so that
    if the gatekeeper has not been configured in the
    pipeline then it will be automatically inserted close
    to the start of the pipeline.
    
    blueprint cluster-federation
    
    Change-Id: I80b8b14243cc59505f8c584920f8f527646b5f45

commit bc0807b3c58a44b81c54fe016497545c54971d92
Author: Peter Portante <peter.portante@redhat.com>
Date:   Wed Dec 11 20:41:34 2013 -0500

Refactor to share on-disk layout with hash cleanup
    
    Closes-Bug: 1260132
    Change-Id: Iaa367c686b8dc49dd53c55a7cca661d9611044f8

commit 157482baa0265503f796dda50a2d35324466d234
Author: Jeremy Stanley <fungi@yuggoth.org>
Date:   Mon Jan 6 03:38:47 2014 +0000

Whitelist external netifaces requirement
    
    * tox.ini(testenv.install_command): Use the --allow-external and
    --allow-insecure options so that pip 1.5 and later will assent to
    retrieve the netifaces package even though it's not hosted on PyPI.
    The --allow-insecure option is aliased to a clearer
    --allow-unverified wording in 1.5, but the old form is being used to
    avoid breaking users of 1.4.x and will be valid at least through
    1.6.x according to comments in the pip source.
    
    Change-Id: I587b516f3f77c930475ca2eaae5ff73c5d2ab576

commit f9d14971c2da8d5b142cce9771496f5a7682b4a7
Author: Florian Hines <syn@ronin.io>
Date:   Sat Jan 4 00:19:22 2014 -0600

Don't report async pendings on exception
    
    If we encounter an exception trying to gather async pendings 'async'
    doesn't exist and the cronjob ends up erroring out and leaving behind a
    stale lock file.
    
    Change-Id: I70a6d3f00bd2c9ce742e6d16af93804280707040

commit d1dd14395259b78ed356b26ca07c54ba22034aaa
Author: Caleb Tennis <caleb.tennis@gmail.com>
Date:   Fri Dec 27 17:38:34 2013 -0500

Up nproc limit on startup.
    
    Separate out setrlimit calls for specific exception handling.
    
    Closes-Bug: #1264561
    Change-Id: I5588f19f8d0393409580d17317727977758d5cb3

commit 62254e42c40c2a99f79816ce47ec26101780c1d0
Author: Florian Hines <syn@ronin.io>
Date:   Wed Dec 18 14:03:23 2013 -0600

Fix checkmount  error parsing in swift-recon
    
    - swift-recon now handles parsing instances where 'mounted' key (in unmounted
      and disk_usage) is an error message instead of a bool.
    - Add's checkmount exception handling to the recon umounted endpoint.
    - Updates existing unittest to have ismount throw an error.
    - Updates unittests to cover the corner cases
    
    Change-Id: Id51d14a8b98de69faaac84b2b34b7404b7df69e9

commit 150f338fc246a33b32b11cdd3abcc2eaacc73bd1
Author: Chmouel Boudjnah <chmouel@enovance.com>
Date:   Mon Dec 23 17:57:56 2013 +0100

Remove swiftclient dep on direct_client
    
    Partial Implements: blueprint remove-swiftclient-dependency
    Change-Id: I9af7150e5d21d50e5f880e57796314b8f05822d2

commit ad881413d0051e2fe9a38b6286f00784e62cfbb7
Author: Chmouel Boudjnah <chmouel@chmouel.com>
Date:   Tue Dec 24 01:17:19 2013 -0800

Allow specify arguments to .probetests script
    
    Just so at least I can add -x or -s.
    
    Change-Id: I95543a3086ca5fb292e7899c02646a58296c610a

commit 72ade27ea322056b993da8cfd389476461a845af
Author: Kun Huang <academicgareth@gmail.com>
Date:   Thu Nov 21 11:14:34 2013 +0800

test 3 date format in functional tests
    
    According to HTTP/1.1, servers MUST accept all three formats:
    
    Sun, 06 Nov 1994 08:49:37 GMT     # RFC 822, updated by RFC 1123
    Sunday, 06-Nov-94 08:49:37 GMT    # RFC 850, obsoleted by RFC 1036
    Sun Nov 6 08:49:37 1994           # ANSI C's asctime() format
    
    In functional tests, a date value header has 3 kinds of format will be
    tested.
    
    Change-Id: I679ed44576208f2a79bffce787cb55bda4b39705
    Closes-Bug: #1253207

commit aae254df55c6f3316927e8926fded15d91d964c5
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Sat Dec 21 11:32:34 2013 -0800

Make POST for bulk delete actually work
    
    Change-Id: I568e7e31df3dcbeac20dba6d543a13c0409de00e
    Closes-Bug: 1232787

commit 34fa05f66ba79defe7bcdedfbb87556e21b25e08
Author: Yuriy Taraday <yorik.sar@gmail.com>
Date:   Fri Dec 20 08:57:43 2013 +0400

Various optimizations to RingBuilder.rebalance()
    
    Overall gain is about 20-22% of time on my laptop. This includes:
    
    * replacing string sort_key with a tuple (because we can and because
      compairing two 3-tuples is faster than comparing two 26-byte strings);
    * keeping track of hungriest dev in tier (allows us to use built-in
      dict.__getitem__ as a key instead of lambdas in couple of places);
    * remove unnecessary sorted() call in the innermost loop (because we
      don't need to sort all of them or better we don't need to sort
      anything there);
    * memoize tiers for each dev (saves just a couple of percents but why
      not).
    
    Performance measurments have been done using this script:
    http://paste.openstack.org/show/55609/
    
    Related-Bug: #1262166
    Related-Bug: #1261659
    Change-Id: If38bd9fe82efc12b01e9aa146e0f4ab565fb6bea

commit d69e013519201af9af7683b1b6dfdf1efa226c7c
Author: Kiyoung Jung <kiyoung.jung@kt.com>
Date:   Thu Nov 7 04:45:27 2013 +0000

change the last-modified header value with valid one
    
    the Last-Modified header in Response didn't have a suitable
    value - an integer part of object's timestamp.
    This leads that the the if-[un]modified-since header with the
    value from last-modified is always earlier than timestamp
    and results the content is always newer than value of these
    conditional headers.
    Patched code returns math.ceil() of object's timestamp
    in Last-Modified header so the later conditional header works
    correctly
    
    Closes-Bug: #1248818
    Change-Id: I1ece7d008551bf989da74d23f0ed6307c45c5436

commit 5f4790a82af09fbf3d1db85ebcfc89d40a284115
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Wed Dec 18 12:11:47 2013 -0800

Allow running just one test with tox
    
    Now, "tox -e py26 -- test.unit.proxy.test_server" runs just that one
    test file.
    
    "tox -e py26" still runs all the tests with py26, just like before.
    
    Change-Id: I40db12dd5e7cc8f9388e29b30447f70d3bfc4b28

commit ace2aa33b19b3ff5abfb3f624414f422f8723b41
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Wed Dec 18 10:38:34 2013 -0800

Fix obj versioning w/non-ASCII container name
    
    If you create a container with a non-ASCII name, and then make another
    container with X-Versions-Location: first-cøntåîner, *and* you're
    serializing stuff in memcache as json (the default), when the proxy
    tries to make a versioned object, it will crash.
    
    The fix is to make sure that get_container_info() always returns strs,
    not unicodes.
    
    The long-term fix would be to get rid of simplejson entirely, as its
    decoder can't make up its mind whether JSON strings should be Python
    strs or unicodes, and that makes it really really easy to write bugs
    like this.
    
    Change-Id: Ib20ea5fb884484a4246d7a21a9f1e2ffd82eb04f

commit 16204c706d15dfc2b5949d8c467427ad6d576299
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Tue Dec 17 16:11:26 2013 -0800

Preserve tracebacks from run_in_thread
    
    Now the traceback goes all the way down to where the exception came
    from, not just down to run_in_thread. Better for debugging.
    
    Change-Id: Iac6acb843a6ecf51ea2672a563d80fa43d731f23

commit 89224ae2864f51f12f11794d3435892627607430
Author: Christian Schwede <christian.schwede@enovance.com>
Date:   Tue Dec 17 11:16:53 2013 +0000

Add swiftbrowser as an associated project
    
    Swiftbrowser is a simple web app build with Django to access Openstack Swift.
    
    Change-Id: I2d397f61e4c3fcd09ba994a06ec4d492f37ae34a

commit a49fd3d8def23bfb49e605fea372c82d1ec6cdff
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Fri Dec 13 13:11:01 2013 -0800

Force catch_errors into pipeline
    
    This commit adds a hook for WSGI applications
    (e.g. proxy.server.Application) to modify their WSGI pipelines. This
    is currently used by the proxy server to ensure that catch_errors is
    present; if it is missing, it is inserted as the first middleware in
    the pipeline.
    
    This lets us write new, mandatory middlewares for Swift without
    breaking existing deployments on upgrade.
    
    Change-Id: Ibed0f2edb6f80c25be182b3d4544e6a67c5050ad

commit 883eb48502b7f4735d325904aa2f5ef7dc3e8278
Author: John Dickinson <me@not.mn>
Date:   Mon Dec 16 12:33:08 2013 -0800

update ssync tests to use req.get_response()
    
    Change-Id: I2fa752b42669fab27d1a8e7562d52d8f8b848351

commit 70fc7df6eb0a14f845e1e7657d8ff782a2f6c1e1
Author: gholt <z-launchpad@brim.net>
Date:   Mon Dec 16 17:14:00 2013 +0000

Just trying to keep /tmp clean
    
    Change-Id: Ia8d7cf37a4f6a4652cb3440a896cefb411cdb41a

commit 1bb6563a198ac32e1d3a9937dc8699a23cf7d816
Author: Peter Portante <peter.portante@redhat.com>
Date:   Tue Dec 3 13:01:15 2013 -0500

Handle non-integer values for if-delete-at
    
    If a client passes us a non-integer value for if-delete-at we'll now
    properly report a 400 error instead of a 503.
    
    Closes-Bug: 1259300
    Change-Id: I8bb0bb9aa158d415d4f491b5802048f0cd4d8ef6

commit 28bee715dca69d5e4432b54cd0e0f231ca54c94c
Author: Clay Gerrard <clay.gerrard@gmail.com>
Date:   Fri Dec 13 16:39:55 2013 -0800

Update swift-config paste appconfig inspection
    
    This crafts a significantly more informative and complete picture of the
    appconfig object after paste has gotten a hold of it.
    
    Change-Id: I07d7248ecf384f32d333025874ecb2782c58d6af

commit 03513a02a2f3c9c333411a23bf335cc96dcb6a2b
Author: David Goetz <dpgoetz@gmail.com>
Date:   Thu Dec 12 16:28:40 2013 -0800

Only retry GETs for objects.
    
    Change-Id: I8b6ceeaa0e5e247e45209deced808b0b78181d53

commit 226e46550f0dc97eb81ea05104460ca6da317b6b
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Thu Dec 12 17:27:47 2013 -0800

Expose basic constraints in /info
    
    Change-Id: I70649e0669e2f7a1d61742a16ed6dc792d4b2a5a

commit 979033a14ebc06d3f360c8a17542c68efb30a032
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Thu Dec 12 16:13:42 2013 -0800

Expose allowed tempurl methods in /info
    
    Clients can construct tempurls for any method, but they only work if
    they're in this list, so it's helpful for clients to see the list.
    
    Change-Id: Id852f457d65b62c4fe79db01b1d7029a5fa5aa09

commit 06b729f6db647f4379d3dceeb1ca1e129a809a3d
Author: Peter Portante <peter.portante@redhat.com>
Date:   Thu Dec 12 11:57:08 2013 -0500

Add some tests to demonstrate hash_cleanup_listdir
    
    We add some tests to verify various behaviors of
    hash_cleanup_listdir(), and break them into individual tests so that
    one failure won't prevent the others from running as well.
    
    Change-Id: I4f52388d813f061b8e7d2b45dfb0f805689020c2

commit d26e8b25a7e5e4e69208bcb86208371f6d569a7c
Author: Peter Portante <peter.portante@redhat.com>
Date:   Wed Dec 11 16:59:59 2013 -0500

Bring obj server unit tests to > 98%
    
    This set of changes attempts to bring the unit test coverage to over
    98% for the object server module.
    
    Two changes to the object server are made with this patch:
    
    1. The try/except block around diskfile.write_metadata() was removed
       at the end of the POST method
    
    The write_metadata() method of the DiskFile module does not raise
    either the DiskFileNotExist or DiskFileQuarantined exceptions on that
    code path.
    
    2. The conditional container_update() call was removed at the end of
       the PUT method
    
    The container_update() calls is performed when a new object is created
    or when an exist object is updated. Since we already report old
    timestamps as 409s (Conflict) we always perform the update.
    
    We also fix an existing test to clear the hash prefix so that it can
    actually detect the async pending pickle file creation for a failure
    mode.
    
    Change-Id: I71ec9dcf7c0ac86e56aa0f06993d501fdfa22d5b

commit 9eb42460c78bebb35c5bffc6e978adcfcc22ede1
Author: Sushil Kumar <sushil.kumar2@globallogic.com>
Date:   Mon Dec 9 13:56:10 2013 +0000

Updates tox.ini to use new features
    
    tox 1.6 allows us to skip the sdist step, which is slow. This does that.
    It also allows us to override the install line. In this case, it's
    important as it allows us to stop getting pre-release software we
    weren't asking for.
    
    Original patch by Monty Taylor, talked about here:
    http://lists.openstack.org/pipermail/openstack-dev/2013-September/015495.html
    
    Change-Id: I2042cae5ebf5467408fe6cdb71f6147081b72ca6

commit ddd8c7358dc6ba285f2f330998f97655575a5676
Author: Morgan Fainberg <m@metacloud.com>
Date:   Tue Dec 10 14:06:45 2013 -0800

Sync global requirements to pin sphinx to sphinx>=1.1.2,<1.2
    
    Sync the global requirements to pin sphinx.  This addresses an issue
    where Sphinx 1.2 is not building documents correctly and causing
    check/gate to fail.
    
    We also had to adjust the pip command used.
    
    Change-Id: I8894c0199db845e90e5086a7c0e6bb7c7a26b5a0

commit 4e7482f97318bdbda1ebfbce5382ae67a70e91e4
Author: Samuel Merritt <sam@swiftstack.com>
Date:   Sun Dec 8 18:00:47 2013 -0800

Use /info to check if SLO is enabled
    
    The functional tests have some hokey detection of SLO support that
    pre-dates the /info API, but we can do better now.
    
    Also moved the SLO check up inside the setUp method so that skipping
    the SLO tests should be somewhat faster now.
    
    Change-Id: I645718b459d794a9a97770f7162934558c94f3e8

commit bdc296abbc23b580849e147ea68d61bf21c2c695
Author: Zhang Jinnan <ben.os@99cloud.net>
Date:   Tue Dec 10 16:16:44 2013 -0800

Remove start index 0 in range()
    
    Remove the useless arg ("start index" = 0) in files, since its default
    value is 0, to make code cleaner.
    
    Fixes bug #1259750
    
    Change-Id: I52afac28a3248895bb1c012a5934d39e7c2cc5a9

commit 9a2bd79073900f7e988c9671f3ce93201abd2dda
Author: Morgan Fainberg <m@metacloud.com>
Date:   Tue Dec 10 14:06:45 2013 -0800

commit 96c9ff56fa211c04aa7e898797eedfb487b8e261
Author: Cristian A Sanchez <cristian.a.sanchez@intel.com>
Date:   Thu Dec 5 17:19:50 2013 -0300

Adds a retry mechanism when deleting containers
    
    Bulk middleware now has a mechanism to retry a delete when the
    HTTP response code is 409. This happens when the container still
    has objects. It can be useful in a bulk delete where you delete
    all the objects in a container and then try to delete the container.
    It is very likely that at the end it will fail because the replica
    objects have not been deleted by the time the middleware got a
    successful response.
    
    Change-Id: I1614fcb5cc511be26a9dda90753dd08ec9546a3c
    Closes-Bug: #1253478

commit 34c2a45d8d16ac8128451d05e2b2d5815b15e461
Author: gholt <z-launchpad@brim.net>
Date:   Fri Dec 6 04:12:15 2013 +0000

Make ssync_receiver drop conn on exc
    
    Before, the ssync receiver wouldn't hang up the connection with all
    errors. Now it will. Hanging up early will ensure the remote end gets
    a "broken pipe" type error right away instead of it possible sending
    more data for quite some time before finally reading the error.
    
    Change-Id: I2d3d55baaf10f99e7edce7843a7106875752020a

commit fdc775d6d52150c8314ec43ef5ab14a5b751e2c7
Author: Cristian A Sanchez <cristian.a.sanchez@intel.com>
Date:   Fri Nov 29 13:59:47 2013 -0300

Increases the UT coverage of db_replicator.py
    
    Adds 20 unit tests to increase the coverage of db_replicator.py
    from 71% to 90%
    
    Change-Id: Ia63cb8f2049fb3182bbf7af695087bfe15cede54
    Closes-Bug: #948179

commit 31b311af57e495bb975d83a4fc1037f7c55b0d83
Author: Pete Zaitcev <zaitcev@kotori.zaitcev.us>
Date:   Fri Nov 22 18:57:44 2013 -0700

Return an exit code for configuration errors
    
    Red Hat's QA noticed that in case of the infamous "xattr>=0.4"
    error, swift-init exits with a zero error code, which confuses
    startup scripts (not Systemd though -- that one knows exactly
    if processes run or not).
    
    The easiest fix is to return the error code like Guido's blog
    post suggested.
    
    Change-Id: I7badd8742375a7cb2aa5606277316477b8083f8d
    Fixes: rhbz#1020480

Peter Portante (peter-a-portante) on 2014-01-20

Changed in swift:
status:	Fix Committed → In Progress
assignee:	nobody → Peter Portante (peter-a-portante)
importance:	Undecided → Low

John Dickinson (notmyname) on 2015-06-03

Changed in swift:
assignee:	Peter Portante (peter-a-portante) → nobody

John Dickinson (notmyname) on 2015-06-03

Changed in swift:
status:	In Progress → Confirmed

Revision history for this message

Matthew Oliver (matt-0) wrote on 2015-09-03:

#10

Is this still a problem, I tried to recreate the problem (based on Clay's initial post), and the auditor and object-server didn't stack trace.

Revision history for this message

Thiago da Silva (thiagodasilva) wrote on 2016-09-16:

#11

I just tried to recreate this bug, same as Matt and Clay and I also no longer see a stack trace, the object server is returning a 404.

But the dangling .meta file is not being quarantined either, it is just left there until I start the replicator and the .data file is put back in place.

Should it be quarantined? or should we trust the replicator will take care of it eventually?

Revision history for this message

Tim Burke (1-tim-z) wrote on 2016-09-16:

#12

I think we want that to hang around; for all we know, that's the only copy of the .meta that got persisted.

*Maybe* we could clean it up if it's older than the reclaim age?

Revision history for this message

clayg (clay-gerrard) wrote on 2017-01-25:

#13

This looks *way* fixed?

https://review.openstack.org/#/c/61822/

What gives?

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-09-09: Change abandoned on swift (master)

#14

Change abandoned by Peter Portante (<email address hidden>) on branch: master
Review: https://review.openstack.org/61876

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.