2016-12-01 14:27:16 |
clayg |
description |
I'm looking at something like this
/srv/node1/sdb1/containers
└── 450
├── afb # empty suffix
├── afc
│ ├── 7089ab48d955ab0851fc51cc17a34afc # bogus file
│ └── c8bcccab3ddbfdc34b08e9223f4f5afc # empty hashdir
└── afd
└── 7089ab48d955ab0851fc51cc17a34afd
└── 7089ab48d955ab0851fc51cc17a34afd.db
The empty dirs can get cleaned up with a fix for lp bug #1583719
But the bogus file still remains.
A file where a hashdir should be (or hashdir where a file should be, i.e. lp bug #1621255) are currently understood to be some kinda of filesystem corruption.
We should either delete these bogus files or maybe better quarantine them.
For objects, it slightly worse, a bogus file where a hashdir should be causes the whole suffix to get quarnatined:
ubuntu@saio:~$ tree /srv/node4/sdb4/objects
/srv/node4/sdb4/objects
└── 645
└── 35a
├── 7089ab48d955ab0851fc51cc17a34afd # bogus file
└── a161102fba1710ef912af194b8d4635a # normal valid data
└── 1480572570.98264.data
3 directories, 2 files
ubuntu@saio:~$ swift-init object-replicator once -nv -c 4
WARNING: Unable to modify max process limit. Running as non-root?
Running object-replicator once...(/etc/swift/object-server/4.conf.d)
object-6040: Running object replicator in script mode.
object-6040: STDERR: ERROR:root:Quarantined /srv/node4/sdb4/objects/645/35a/7089ab48d955ab0851fc51cc17a34afd to /srv/node4/sdb4/quarantined/objects/35a because it is not a directory#012Traceback (most recent call last):#012 File "/vagrant/swift/swift/obj/diskfile.py", line 951, in _hash_suffix_dir#012 ondisk_info = self.cleanup_ondisk_files(hsh_path, reclaim_age)#012 File "/vagrant/swift/swift/obj/diskfile.py", line 906, in cleanup_ondisk_files#012 files = listdir(hsh_path)#012 File "/vagrant/swift/swift/common/utils.py", line 3104, in listdir#012 return os.listdir(path)#012OSError: [Errno 20] Not a directory: '/srv/node4/sdb4/objects/645/35a/7089ab48d955ab0851fc51cc17a34afd'
object-6040: 1/1 (100.00%) partitions replicated in 0.03s (29.83/sec, 0s remaining)
object-6040: 2 successes, 0 failures
object-6040: Object replication complete (once). (0.00 minutes)
object-6040: Exited
ubuntu@saio:~$ tree /srv/node4/sdb4/objects
/srv/node4/sdb4/objects
└── 645
└── hashes.pkl
1 directory, 1 file
ubuntu@saio:~$ ls /srv/node4/sdb4/quarantined/objects/35a
7089ab48d955ab0851fc51cc17a34afd a161102fba1710ef912af194b8d4635a
What seems to be going on in the diskfile here, is just that the quarantine interface is expecting a data file instead of a hashdir. i.e.
this one works:
https://github.com/openstack/swift/blob/43a175ebd284e825c0f6e5b79a23d8a32f62326e/swift/obj/diskfile.py#L2044
this one is quite wrong:
https://github.com/openstack/swift/blob/43a175ebd284e825c0f6e5b79a23d8a32f62326e/swift/obj/diskfile.py#L957 |
I'm looking at something like this
/srv/node1/sdb1/containers
└── 450
├── afb # empty suffix
├── afc
│ ├── 7089ab48d955ab0851fc51cc17a34afc # bogus file
│ └── c8bcccab3ddbfdc34b08e9223f4f5afc # empty hashdir
└── afd
└── 7089ab48d955ab0851fc51cc17a34afd
└── 7089ab48d955ab0851fc51cc17a34afd.db
The empty dirs can get cleaned up with a fix for lp bug #1583719
But the bogus file still remains.
A file where a hashdir should be (or hashdir where a file should be, i.e. lp bug #1621255) are currently understood to be some kinda of filesystem corruption.
We should either delete these bogus files or maybe better quarantine them.
For objects, it slightly worse lp bug #1646502 |
|