silent exception handler block in collect_jobs

Bug #1229372 reported by clayg
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Confirmed
Low
Unassigned

Bug Description

In the object replicators collect_jobs method, the os.listdir loop at the partition level swallows ValueError and OSError without emitting a log message. In at least once case there was a post on the mailing list that looked like it might have been some garbage in that directory that was blowing up the entire replication pass and the partition/path in question wasn't being logged.

I think the exception handler should be more narrow (I'm not sure *exactly* what it was trying to catch) and log a warning (or even a info/debug).

Revision history for this message
Kun Huang (academicgareth) wrote :

ValueError is added by David Goetz in a very very old patch (a71164995aaafd626cc16ba59cd01dbb69e33fe0), but I didn't find out what's it prevent from here. But OSError is added reasonable in 46a093f068a158b72479522792509a882d3e47f1.

I think at least we could add a log with traceback here

Revision history for this message
Jiri Suchomel (jsuchome) wrote :

OSError check was added by https://review.openstack.org/#/c/12378/ , I assume because of os.remove call.

Would it be enough to add something like

- except (ValueError, OSError):
+ except OSError:
+ self.logger.exception('ERROR removing partition directory %s' % job_path)
+ except ValueError:

?

Revision history for this message
clayg (clay-gerrard) wrote :

So looking at it again, I think maybe we thought the OSError was for something like ENOPERM from isfile (which just returns False on all os.error), and the ValueError was from int(partition). IIRC the random conversation in IRC that caused me to file this, it was the ValueError case that may have been interesting.

I think we just need tighter exceptions and never skip a path without logging it, maybe it'd be easier to see if we extract it to function and test it real hard?

Revision history for this message
Thiago da Silva (thiagodasilva) wrote :

This code base has changed quite a bit since this bug was filed, but I think we still don't log as Clay initially suggested it. We are now tracking failures for recon[1] , but that's it. Is that enough or should we still add the log message there?

[1] https://github.com/openstack/swift/commit/79ba4a85983641e539b620bd143e62673c98416e

clayg (clay-gerrard)
Changed in swift:
status: New → Confirmed
importance: Undecided → Low
tags: added: low-hanging-fruit
Revision history for this message
clayg (clay-gerrard) wrote :

this some how reminds me of the container dir walker fixes that Mahati worked on [1]. The point is we need to iter the dirs and trap errors right w/o blowing up the whole loop and when we find something that we really don't know what to do with emit an error and pointer to the operator how to get rid of it:

"This thing at <path> is not great for me!"

That can be at debug, but warning is probably fine since we think these things are unexpected and not the common case!

1. https://review.openstack.org/#/c/331601/

tags: added: consistency-engine
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.