Race/backtrace in hash_suffix() - deleting ts files

Bug #1185788 reported by Donagh McCabe
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Undecided
Unassigned

Bug Description

Reading code to see how .ts files are reclaimed I notice that both object-replicator and object-server call get_hashes, and hence hash_suffix(). I wondered if they can race and sure enough I find backtraces such as:

May 29 23:45:36 sw-aw2az1-object012 object-replicator STDOUT: ERROR:root:Error hashing suffix#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 199, in get_hashes#012 hashes[suffix] = hash_suffix(suffix_dir, reclaim_age)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 128, in hash_suffix#012 os.rmdir(hsh_path)#012OSError: [Errno 2] No such file or directory: '/srv/node/disk9/objects/1995540/da5/f398a2137e9f56e8d5e632510726dda5'

and

Apr 15 16:10:49 sw-aw2az1-object059 object-replicator STDOUT: ERROR:root:Error hashing suffix#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 199, in get_hashes#012 hashes[suffix] = hash_suffix(suffix_dir, reclaim_age)#012 File "/usr/lib/python2.7/dist-packages/swift/obj/replicator.py", line 109, in hash_suffix#012 os.unlink(join(hsh_path, files[0]))#012OSError: [Errno 2] No such file or directory: '/srv/node/disk6/objects/1775921/78c/d8c98eb5fcf44d1a6d0d13a34879678c/1359722081.33112.ts'

...and on investigation, the files don't exist anymore.

It's not very common -- on our production system a node will typically report one such backtrace for either the object-server or the object-replicator per day

A potential fix is to wrap the rmdir/unlink in a try/except and do an os.stat() in the exception to see if it exits anymore. However, I was looking for a race, but is it? I don't want to patch over some other source of the problem.

Revision history for this message
John Dickinson (notmyname) wrote :

needs to be confirmed on recent versions of swift

Changed in swift:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Object Storage (swift) because there has been no activity for 60 days.]

Changed in swift:
status: Incomplete → Expired
Changed in swift:
status: Expired → Fix Released
Revision history for this message
Donagh McCabe (donagh-mccabe) wrote :

This code has been refactored a long time ago and a try/accept wraps the directory remove (diskfile.py) and the file unkink (utils.py). Hence marked fixed.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.