swift-recon-object-cron gets stuck

Bug #1687509 reported by clayg on 2017-05-02
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)

Bug Description

[Errno 17] File exists: '/var/lock/swift/swift-recon-object-cron'

If the lock dir for swift-recon-cron doesn't get cleaned up because of an unclean shutdown it will never start again until an operator logs in to rmdir.

Probably this happens when someone decides they have to send sigterm to everyone using /etc/swift/object-server.conf

In this case I think swift-recon-cron would have an opportunity to handle SystemExit - but it might be useful to add a configurable timeout on the lockdir as well. This job is supposed to run pretty frequently to give up-to-date numbers. You don't want to overwhelm the disks - but I don't think it's going to cause systemic resource consumption issues if you were to consistently blow away >1hr old lockdirs. Maybe a default timeout in the range of 6-24hrs would be helpful and safe in nearly all imaginable configurations?

N.B. all of the code for this bin script is in the bin dir so moving it to the cli module will be a requirement for unittesting.

Changed in swift:
status: New → Confirmed
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers