object auditor crashed

Bug #837409 reported by David Kranz on 2011-08-30
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Undecided
David Goetz

Bug Description

I have been running a swift cluster with 3 storage nodes for many days without incident. Then nagios informed me that the object-auditor process was not running on any of them. Right before the time of the report I saw the following two error messages in the log, each repeated twice before the process stopped. The same thing happened on all three storage nodes at about the same time referencing the same /src/node/... file in each case. I am running this version:

ii swift-object 1.4.2-0ubuntu0~ppa1~lucid1 distributed virtual object store - object server

Log messages:

Aug 29 17:40:10 jc2 object-auditor Begin object audit "forever" mode (ZBF)
Aug 29 17:40:10 jc2 object-auditor ERROR Trying to audit /srv/node/sdc1/objects/397/fc5/6361d1e9d2adc641b1a20f80cdec0fc5/1314111606.99803.data: #012Traceback (most recent call last):#012 File "/usr/lib/pymodules/python2.6/swift/obj/auditor.py", line 136, in object_audit#012 File "/usr/lib/pymodules/python2.6/swift/obj/server.py", line 173, in __init__#012IOError: [Errno 24] Too many open files: '/srv/node/sdc1/objects/397/fc5/6361d1e9d2adc641b1a20f80cdec0fc5/1314111607.73028.meta'

Aug 29 17:40:40 jc2 object-auditor UNCAUGHT EXCEPTION#012Traceback (most recent call last):#012 File "/usr/bin/swift-object-auditor", line 27, in <module>#012 File "/usr/lib/pymodules/python2.6/swift/common/daemon.py", line 88, in run_daemon#012 File "/usr/lib/pymodules/python2.6/swift/common/daemon.py", line 54, in run#012 File "/usr/lib/pymodules/python2.6/swift/obj/auditor.py", line 201, in run_forever#012 File "/usr/lib/pymodules/python2.6/swift/obj/auditor.py", line 210, in run_once#012 File "/usr/lib/pymodules/python2.6/swift/obj/auditor.py", line 74, in audit_all_objects#012 File "/usr/lib/pymodules/python2.6/swift/common/utils.py", line 867, in audit_location_generator#012OSError: [Errno 24] Too many open files: '/srv/node'

Related branches

David Kranz (david-kranz) wrote :

I should add that there were no other errors in the log and that there had not been any activity of adding/removing files, accounts or containers for several days before this happened. The system was "idling" when it happened.

Changed in swift:
assignee: nobody → David Goetz (david-goetz)
David Goetz (david-goetz) wrote :

I just proposed a merge to fix this:

https://code.launchpad.net/~david-goetz/swift/auditor_bug/+merge/73449

Thanks very much for reporting this!

David

Thierry Carrez (ttx) on 2011-09-12
Changed in swift:
milestone: none → 1.4.3
status: New → Fix Committed
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers