OpenStack Object Storage (swift)

object-replicator update_deleted throws oserror if suffix directory is cleaned up externally

Bug #1397668 reported by Caleb Tennis on 2014-11-30

10

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	New	Undecided	Unassigned

Bug Description

When running multiple object-replicator processes it's possible that a partition gets cleaned up by process B before process A gets to it. Because the partitions lists are generated at startup and randomized, the order will vary. HOwever, when process A does finally get to the, now deleted, partition, it throws an OSError in the logs which is not checked as an exception in the update_deleted method.

Would be nice to add an exception check for OSError along side the existing Exception check and log it in a more graceful manner.

Essentially:

tpool_get_suffixes#012OSError: [Errno 2] No such file or directory: '/srv/node/d358/objects/44975'

Revision history for this message

Caleb Tennis (ctennis) wrote on 2014-11-30:

#1

For the record, the full backtrace is this:

Nov 30 15:23:32 localhost object-replicator: Error syncing handoff partition: #012Traceback (most recent call last):#012 File "/opt/ss/lib/python2.7/site-packages/swift/obj/replicator.py", line 223, in update_deleted#012 suffixes = tpool.execute(tpool_get_suffixes, job['path'])#012 File "/opt/ss/lib/python2.7/site-packages/eventlet/tpool.py", line 76, in tworker#012 rv = meth(*args,**kwargs)#012 File "/opt/ss/lib/python2.7/site-packages/swift/obj/replicator.py", line 216, in tpool_get_suffixes#012 return [suff for suff in os.listdir(path)#012OSError: [Errno 2] No such file or directory: '/srv/node/d302/objects/30377'

Revision history for this message

Samuel Merritt (torgomatic) wrote on 2014-12-01:

#2

For clarification, do you mean one object replicator with concurrency = N in its config (N > 1), or do you mean multiple object replicator processes?

Revision history for this message

Caleb Tennis (ctennis) wrote on 2014-12-01: Re: [Bug 1397668] Re: object-replicator update_deleted throws oserror if suffix directory is cleaned up externally

#3

multiple processes, which I realize is not the norm.

On Mon, Dec 1, 2014 at 1:52 PM, Samuel Merritt <email address hidden>
wrote:

> For clarification, do you mean one object replicator with concurrency =
> N in its config (N > 1), or do you mean multiple object replicator
> processes?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1397668
>
> Title:
> object-replicator update_deleted throws oserror if suffix directory is
> cleaned up externally
>
> Status in OpenStack Object Storage (Swift):
> New
>
> Bug description:
> When running multiple object-replicator processes it's possible that a
> partition gets cleaned up by process B before process A gets to it.
> Because the partitions lists are generated at startup and randomized,
> the order will vary. HOwever, when process A does finally get to the,
> now deleted, partition, it throws an OSError in the logs which is not
> checked as an exception in the update_deleted method.
>
> Would be nice to add an exception check for OSError along side the
> existing Exception check and log it in a more graceful manner.
>
> Essentially:
>
> tpool_get_suffixes#012OSError: [Errno 2] No such file or directory:
> '/srv/node/d358/objects/44975'
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/swift/+bug/1397668/+subscriptions
>

Revision history for this message

Caleb Tennis (ctennis) wrote on 2014-12-10:

#4

I have a patch located at https://review.openstack.org/#/c/138837/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-06-05: Change abandoned on swift (master)

#5

Change abandoned by John Dickinson (<email address hidden>) on branch: master
Review: https://review.openstack.org/138837
Reason: Abandoning due to lack of activity since the last negative review. You can restore the change if you want to keep working on it.

Revision history for this message

Caleb Tennis (ctennis) wrote on 2015-10-18:

#6

Nothing this is still an issue, in two separate places actually:

Oct 17 14:34:16 localhost object-replicator: Error syncing partition: #012Traceback (most recent call last):#012 File "/usr/lib/pymodules/python2.7/swift/obj/replicator.py", line 336, in update#012 reclaim_age=self.reclaim_age)#012 File "/usr/lib/pymodules/python2.7/swift/common/utils.py", line 2914, in tpool_reraise#012 raise resp#012OSError: [Errno 2] No such file or directory: '/srv/node/d23/objects/62011'

Oct 17 14:34:16 localhost object-replicator: Error syncing handoff partition: #012Traceback (most recent call last):#012 File "/usr/lib/pymodules/python2.7/swift/obj/replicator.py", line 249, in update_deleted#012 suffixes = tpool.execute(tpool_get_suffixes, job['path'])#012 File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute#012 six.reraise(c, e, tb)#012 File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker#012 rv = meth(*args, **kwargs)#012 File "/usr/lib/pymodules/python2.7/swift/obj/replicator.py", line 241, in tpool_get_suffixes#012 return [suff for suff in os.listdir(path)#012OSError: [Errno 2] No such file or directory: '/srv/node/d22/objects/30577'

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.