Activity log for bug #1646362

Date Who What changed Old value New value Message
2016-12-01 06:26:03 clayg bug added bug
2016-12-01 14:24:48 clayg summary spurious files next to hashdir don't get cleaned up spurious files next to container hashdir don't get cleaned up
2016-12-01 14:27:16 clayg description I'm looking at something like this /srv/node1/sdb1/containers └── 450 ├── afb # empty suffix ├── afc │   ├── 7089ab48d955ab0851fc51cc17a34afc # bogus file │   └── c8bcccab3ddbfdc34b08e9223f4f5afc # empty hashdir └── afd └── 7089ab48d955ab0851fc51cc17a34afd └── 7089ab48d955ab0851fc51cc17a34afd.db The empty dirs can get cleaned up with a fix for lp bug #1583719 But the bogus file still remains. A file where a hashdir should be (or hashdir where a file should be, i.e. lp bug #1621255) are currently understood to be some kinda of filesystem corruption. We should either delete these bogus files or maybe better quarantine them. For objects, it slightly worse, a bogus file where a hashdir should be causes the whole suffix to get quarnatined: ubuntu@saio:~$ tree /srv/node4/sdb4/objects /srv/node4/sdb4/objects └── 645 └── 35a ├── 7089ab48d955ab0851fc51cc17a34afd # bogus file └── a161102fba1710ef912af194b8d4635a # normal valid data └── 1480572570.98264.data 3 directories, 2 files ubuntu@saio:~$ swift-init object-replicator once -nv -c 4 WARNING: Unable to modify max process limit. Running as non-root? Running object-replicator once...(/etc/swift/object-server/4.conf.d) object-6040: Running object replicator in script mode. object-6040: STDERR: ERROR:root:Quarantined /srv/node4/sdb4/objects/645/35a/7089ab48d955ab0851fc51cc17a34afd to /srv/node4/sdb4/quarantined/objects/35a because it is not a directory#012Traceback (most recent call last):#012 File "/vagrant/swift/swift/obj/diskfile.py", line 951, in _hash_suffix_dir#012 ondisk_info = self.cleanup_ondisk_files(hsh_path, reclaim_age)#012 File "/vagrant/swift/swift/obj/diskfile.py", line 906, in cleanup_ondisk_files#012 files = listdir(hsh_path)#012 File "/vagrant/swift/swift/common/utils.py", line 3104, in listdir#012 return os.listdir(path)#012OSError: [Errno 20] Not a directory: '/srv/node4/sdb4/objects/645/35a/7089ab48d955ab0851fc51cc17a34afd' object-6040: 1/1 (100.00%) partitions replicated in 0.03s (29.83/sec, 0s remaining) object-6040: 2 successes, 0 failures object-6040: Object replication complete (once). (0.00 minutes) object-6040: Exited ubuntu@saio:~$ tree /srv/node4/sdb4/objects /srv/node4/sdb4/objects └── 645 └── hashes.pkl 1 directory, 1 file ubuntu@saio:~$ ls /srv/node4/sdb4/quarantined/objects/35a 7089ab48d955ab0851fc51cc17a34afd a161102fba1710ef912af194b8d4635a What seems to be going on in the diskfile here, is just that the quarantine interface is expecting a data file instead of a hashdir. i.e. this one works: https://github.com/openstack/swift/blob/43a175ebd284e825c0f6e5b79a23d8a32f62326e/swift/obj/diskfile.py#L2044 this one is quite wrong: https://github.com/openstack/swift/blob/43a175ebd284e825c0f6e5b79a23d8a32f62326e/swift/obj/diskfile.py#L957 I'm looking at something like this /srv/node1/sdb1/containers └── 450     ├── afb # empty suffix     ├── afc     │   ├── 7089ab48d955ab0851fc51cc17a34afc # bogus file     │   └── c8bcccab3ddbfdc34b08e9223f4f5afc # empty hashdir     └── afd         └── 7089ab48d955ab0851fc51cc17a34afd             └── 7089ab48d955ab0851fc51cc17a34afd.db The empty dirs can get cleaned up with a fix for lp bug #1583719 But the bogus file still remains. A file where a hashdir should be (or hashdir where a file should be, i.e. lp bug #1621255) are currently understood to be some kinda of filesystem corruption. We should either delete these bogus files or maybe better quarantine them. For objects, it slightly worse lp bug #1646502
2017-07-24 22:34:51 John Dickinson swift: status New Fix Released