object-replicator update_deleted throws oserror if suffix directory is cleaned up externally

Bug #1397668 reported by Caleb Tennis
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
New
Undecided
Unassigned

Bug Description

When running multiple object-replicator processes it's possible that a partition gets cleaned up by process B before process A gets to it. Because the partitions lists are generated at startup and randomized, the order will vary. HOwever, when process A does finally get to the, now deleted, partition, it throws an OSError in the logs which is not checked as an exception in the update_deleted method.

Would be nice to add an exception check for OSError along side the existing Exception check and log it in a more graceful manner.

Essentially:

tpool_get_suffixes#012OSError: [Errno 2] No such file or directory: '/srv/node/d358/objects/44975'

Revision history for this message
Caleb Tennis (ctennis) wrote :

For the record, the full backtrace is this:

Nov 30 15:23:32 localhost object-replicator: Error syncing handoff partition: #012Traceback (most recent call last):#012 File "/opt/ss/lib/python2.7/site-packages/swift/obj/replicator.py", line 223, in update_deleted#012 suffixes = tpool.execute(tpool_get_suffixes, job['path'])#012 File "/opt/ss/lib/python2.7/site-packages/eventlet/tpool.py", line 76, in tworker#012 rv = meth(*args,**kwargs)#012 File "/opt/ss/lib/python2.7/site-packages/swift/obj/replicator.py", line 216, in tpool_get_suffixes#012 return [suff for suff in os.listdir(path)#012OSError: [Errno 2] No such file or directory: '/srv/node/d302/objects/30377'

Revision history for this message
Samuel Merritt (torgomatic) wrote :

For clarification, do you mean one object replicator with concurrency = N in its config (N > 1), or do you mean multiple object replicator processes?

Revision history for this message
Caleb Tennis (ctennis) wrote : Re: [Bug 1397668] Re: object-replicator update_deleted throws oserror if suffix directory is cleaned up externally

multiple processes, which I realize is not the norm.

On Mon, Dec 1, 2014 at 1:52 PM, Samuel Merritt <email address hidden>
wrote:

> For clarification, do you mean one object replicator with concurrency =
> N in its config (N > 1), or do you mean multiple object replicator
> processes?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1397668
>
> Title:
> object-replicator update_deleted throws oserror if suffix directory is
> cleaned up externally
>
> Status in OpenStack Object Storage (Swift):
> New
>
> Bug description:
> When running multiple object-replicator processes it's possible that a
> partition gets cleaned up by process B before process A gets to it.
> Because the partitions lists are generated at startup and randomized,
> the order will vary. HOwever, when process A does finally get to the,
> now deleted, partition, it throws an OSError in the logs which is not
> checked as an exception in the update_deleted method.
>
> Would be nice to add an exception check for OSError along side the
> existing Exception check and log it in a more graceful manner.
>
> Essentially:
>
> tpool_get_suffixes#012OSError: [Errno 2] No such file or directory:
> '/srv/node/d358/objects/44975'
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/swift/+bug/1397668/+subscriptions
>

Revision history for this message
Caleb Tennis (ctennis) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on swift (master)

Change abandoned by John Dickinson (<email address hidden>) on branch: master
Review: https://review.openstack.org/138837
Reason: Abandoning due to lack of activity since the last negative review. You can restore the change if you want to keep working on it.

Revision history for this message
Caleb Tennis (ctennis) wrote :

Nothing this is still an issue, in two separate places actually:

    Oct 17 14:34:16 localhost object-replicator: Error syncing partition: #012Traceback (most recent call last):#012 File "/usr/lib/pymodules/python2.7/swift/obj/replicator.py", line 336, in update#012 reclaim_age=self.reclaim_age)#012 File "/usr/lib/pymodules/python2.7/swift/common/utils.py", line 2914, in tpool_reraise#012 raise resp#012OSError: [Errno 2] No such file or directory: '/srv/node/d23/objects/62011'

    Oct 17 14:34:16 localhost object-replicator: Error syncing handoff partition: #012Traceback (most recent call last):#012 File "/usr/lib/pymodules/python2.7/swift/obj/replicator.py", line 249, in update_deleted#012 suffixes = tpool.execute(tpool_get_suffixes, job['path'])#012 File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute#012 six.reraise(c, e, tb)#012 File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker#012 rv = meth(*args, **kwargs)#012 File "/usr/lib/pymodules/python2.7/swift/obj/replicator.py", line 241, in tpool_get_suffixes#012 return [suff for suff in os.listdir(path)#012OSError: [Errno 2] No such file or directory: '/srv/node/d22/objects/30577'

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.