tombstone (.ts) object will never be deleted if the hash_suffix exists in the hashes.pkl

Bug #1301728 reported by Hugo Kou
54
This bug affects 10 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
High
Mahati Chamarthy

Bug Description

Swift Version : 1.13.0

[object-replicator]
vm_test_mode = yes
reclaim_age = 30
log_level = DEBUG

======================

if the suffix hash of tombstone file were calculated in the hashes.pkl before the reclaim time. The hash value will been wrote in the hashes pickle.

So the .ts file will never be deleted due to the hash no longer in *None*.

https://github.com/openstack/swift/blob/master/swift/obj/diskfile.py#L316-L319

I did not find any code to set the hash value to *None* for .ts under this situation. Even the .ts is over the reclaim_age.

It causes bunch of .ts file remain on disks and consume lots inodes.

Revision history for this message
Hugo Kou (tonytkdk) wrote :

There's a unclear description of do_listdir.

:param do_listdir: force existence check for all hashes in the partition

The main job of do_listdir is for adding any missed key of suffix there instead of set the existing value to "None" . That's my understanding from the description. So that the following code looks normal for me.

        hashes.setdefault(suff, None)

https://github.com/....../master/swift/obj/diskfile.py......

setdefault returns the value of the key which exists and set None for new key in the dictionary. It's not going to scrub the value of a suffix which already in the pickle.

I'm not sure if developer expected the suffix of .ts suppose to be set in None by this line. If that's the case, then there's a bug(it should be heashes[suff] = None.

Maybe I'm wrong :

Is there any other code to set the value for a suffix which belong to .ts ?

Hugo

Revision history for this message
Bill Owen (billowen) wrote :

I saw the same problem as well. The problem can be reproduced as follows:
1. set reclaim_age to some small value like 30s or 120s
2. create an object
3. delete the object
4. run swift-object-replicator with -o (one pass) flag set before reclaim_age has expired
5. hashes.pkl is updated to include the subdirectory key & non-zero hash value (hash of "<delete timestamp>.ts")
6. re-run swift-object-replicator any number of times, but the .ts file is never removed.

The problem is that with a non-None hash value, the method hash_suffix is never called, hash_cleanup_listdir is not called, and the .ts file is never removed.

If the above steps are repeated, but in step 4, you wait until reclaim_age seconds after the object was deleted, the ts file will be deleted as expected.

As stated above, the problem seems to be here:
https://github.com/openstack/swift/blob/master/swift/obj/diskfile.py#L312

If this is changed to:
 hashes[suff] = None

tombstone deletes work as expected.

Hugo Kou (tonytkdk)
Changed in swift:
assignee: nobody → Hugo Kou (tonytkdk)
status: New → In Progress
Revision history for this message
clayg (clay-gerrard) wrote :

yeah but that will also rehashes all directories all the time... the setdefault is intentional - it the whole point - don't traverse directories that haven't had any activity since the last replication pass. But if that goes on for a long time...

I think either the hash.pkl can get updated to maybe include also a timestamp next to the hash, so that come reclaim age these really inactive suffixes can go ahead and get swept out, or have something that notices reclaim-aged objects (like the auditor) call invalidate_hash.

Anyway, I don't think just hashing all the directories all the time is gunna work how how we want...

Hugo Kou (tonytkdk)
Changed in swift:
status: In Progress → New
assignee: Hugo Kou (tonytkdk) → nobody
Revision history for this message
Hugo Kou (tonytkdk) wrote :

Hi Clay,

That's what I were guessing.
Apparently, Swift needs a way to invalidate the hash of .ts which is over the reclaim_age.

Revision history for this message
clayg (clay-gerrard) wrote :

It seems like it might be safest to just to have the auditor throw out tombstones older than reclaim age and leave the hashes.pkl and replication out of it.

Brian Cline (briancline)
Changed in swift:
assignee: nobody → Brian Cline (briancline)
status: New → Confirmed
Revision history for this message
Brian Cline (briancline) wrote :

I've seen this occur with a reclaim_age of the default of 1wk, possibly for reasons that were specific to our installation (but aren't any longer).

I can reproduce with Bill's steps here in #2. I'll take a stab at tossing this into the auditor.

Brian Cline (briancline)
Changed in swift:
status: Confirmed → In Progress
Revision history for this message
Bill Owen (billowen) wrote :

clayg, briancline - In our configurations, we may not run object auditor at all. It would be preferrable to continue to have replicator be responsible for tombstone cleanup, but modify replicator code to handle this in all cases One option is to have a separate tshashes.pkl to handle deleted objects.

Revision history for this message
Brian Cline (briancline) wrote :

For curiosity sake, generally speaking, is it typical not to run the object auditor?

I can certainly think of one or two use cases where it isn't necessary nor desired. Just trying to gauge how common that is.

Revision history for this message
Bill Owen (billowen) wrote :

I don't think it's common, but we are planning a deployment where swift runs on a shared filesystem (gpfs), and delegates replication to the filesystem, so swift replication factors are set to 1. In this case, if a bad object is detected, the auditor would move the only copy of the object to quarantine. Instead, we would rather disable the auditor and periodically evaluate object checksums outside of swift, in the same way we manage replication outside of swift.

Revision history for this message
Brian Cline (briancline) wrote :

(FYI still working on this off-and-on. Wanted to wait for 2.0.0-rc1 before doing anything super major with it )

Revision history for this message
yinyin (yinyin2010) wrote :

I just submit a patch to solve it in object-auditor:
https://review.openstack.org/#/c/134855/

I've test it in my env.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (master)

Change abandoned by John Dickinson (<email address hidden>) on branch: master
Review: https://review.openstack.org/134855
Reason: Abandoning due to lack of activity since the last negative review. You can restore the change if you want to keep working on it.

Revision history for this message
Hugo Kou (tonytkdk) wrote :

This bug is still existing for very long time. I think the easiest way might be have an additional worker of command to cleanup those .ts if there's no any other better idea at this moment.

```
[swiftstack@swift1 ~]$ ls /srv/node/d12/objects/1223 | wc -l
1686

[root@swift1 ~]# rm /srv/node/d12/objects/1223/hashes.pkl

[root@swift1 ~]# swift-object-replicator ~/swiftstack-agent/2.conf -p 1223 -o -v
object-replicator: Running object replicator in script mode.
object-replicator: 1/1 (100.00%) partitions replicated in 53.97s (0.02/sec, 0s remaining)
object-replicator: 229 suffixes checked - 100.00% hashed, 29.69% synced
object-replicator: Partition times: max 53.8851s, min 53.8851s, med 53.8851s
object-replicator: Object replication complete (once). (0.90 minutes)
object-replicator: Exited

[root@swift1 ~]# ls /srv/node/d12/objects/1223 | wc -l
230
```

Changed in swift:
importance: Undecided → High
Changed in swift:
assignee: Brian Cline (briancline) → Mahati Chamarthy (mahati-chamarthy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (master)

Fix proposed to branch: master
Review: https://review.openstack.org/346865

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (master)

Reviewed: https://review.openstack.org/346865
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=81d4673674febdbe0bba6e27a6d24739456eb3c4
Submitter: Jenkins
Branch: master

commit 81d4673674febdbe0bba6e27a6d24739456eb3c4
Author: Mahati Chamarthy <email address hidden>
Date: Mon Jul 25 20:10:44 2016 +0530

    Delete old tombstones

    - Call invalidate_hash in auditor for reclaimable tombstones
    - assert changed auditor behavior with a unit test
    - driveby test: assert get_hashes behavior with a unit test

    Co-Authored-By: Pete Zaitcev <email address hidden>
    Co-Authored-By: Kota Tsuyuzaki <email address hidden>
    Closes-Bug: #1301728
    Change-Id: I3e99dc702d55a7424c6482969e03cb4afac854a4

Changed in swift:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.10.0

This issue was fixed in the openstack/swift 2.10.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to swift (feature/hummingbird)

Fix proposed to branch: feature/hummingbird
Review: https://review.openstack.org/400985

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to swift (feature/hummingbird)
Download full text (78.0 KiB)

Reviewed: https://review.openstack.org/400985
Committed: https://git.openstack.org/cgit/openstack/swift/commit/?id=0c3f8f87104af8717115c5badffd243dbaa1c430
Submitter: Jenkins
Branch: feature/hummingbird

commit 2d25fe6ad3573b2a06b6b3e5e66493d7b0c55693
Author: Tim Burke <email address hidden>
Date: Mon Jul 25 15:06:23 2016 -0700

    Reduce backend requests for SLO If-Match / HEAD requests

    ... by storing SLO Etag and size in sysmeta.

    Previously, we had to GET the manifest for every HEAD or conditional
    request to an SLO. Worse, since SLO PUTs require that we HEAD every
    segment, we'd GET all included sub-SLO manifests. This was necessary so
    we could recompute the large object's Etag and content-length.

    Since we already know both of those during PUT, we'll now store it in
    object sysmeta. This allows us to:

     * satisfy HEAD requests based purely off of the manifest's HEAD
       response, and
     * perform the If-(None-)Match comparison on the object server, without
       any additional subrequests.

    Note that the large object content-length can't just be parsed from
    content-type -- with fast-POST enabled, the content-type coming out of
    the object-server won't necessarily include swift_bytes.

    Also note that we must still fall back to GETting the manifest if the
    sysmeta headers were not found. Otherwise, we'd break existing large
    objects.

    Change-Id: Ia6ad32354105515560b005cea750aa64a88c96f9

commit ae7dddd801e28217d7dc46bd45cd6b621f29340c
Author: Ondřej Nový <email address hidden>
Date: Mon Nov 21 22:13:11 2016 +0100

    Added comment for "user" option in drive-audit config

    Change-Id: I24362826bee85ac3304e9b63504c9465da673014

commit c3e1d847f4b9d6cc6212aae4dc1b1e6dff45fb40
Author: Thiago da Silva <email address hidden>
Date: Thu Nov 17 17:17:00 2016 -0500

    breaking down tests.py into smaller pieces

    tests.py is currently at ~5500 lines of code, it's
    time to break it down into smaller files.

    I started with an easy middleware set of tests
    (i.e., versioned writes, ~600 lines of code ) so I can get
    some feedback. There are more complicated tests that cover
    multiple middlewares for example, it is not so clear where
    those should go.

    Change-Id: I2aa6c18ee5b68d0aae73cc6add8cac6fbf7f33da
    Signed-off-by: Thiago da Silva <email address hidden>

commit 5d7a3a4172f0f11ab870252eec784cf24b247dea
Author: Ondřej Nový <email address hidden>
Date: Sat Nov 19 23:24:30 2016 +0100

    Removed "in-process-" from func env tox name

    This shorten shebang in infra, because we are hitting 128 bytes limit.

    Change-Id: I02477d81b836df71780942189d37d616944c4dce

commit 9ea340256996a03c8c744201297b47a0e91fe65b
Author: Kota Tsuyuzaki <email address hidden>
Date: Fri Nov 18 01:50:11 2016 -0800

    Don't overwrite built-in 'id'

    This is a follow up for https://review.openstack.org/#/c/399237

    'id' is assigned as a builtin function so that we should not use 'id'
    for the local variable name.

    Change-Id: Ic27460d49e68f6cd50bda1d5b3810e01ccb07a37

commit bf...

tags: added: in-feature-hummingbird
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/swift 2.10.0

This issue was fixed in the openstack/swift 2.10.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.