librarian hangs for unknown reason
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
nagios alerted with: CHECK_NRPE: Socket timeout after 10 seconds.
a manual run of the check on acamar confirmed unable to connect.
restarting lauchpad_librarian3 resolved this, but did some debugging first and asked wgrant what to check. No obvious problems, nothing in logs (it had not logged anything for past 15 minutes). wgrant had me check file handles but lsof -p <pid> only reported 247 lines before restarting it.
Filing this report because as elmo points out we have seen this before:
<elmo> wgrant: so, it's not a single instance
<wgrant> orly
<wgrant> elmo: Sorry, it's the first I've heard of this sort of thing happening in a while (since the kernel issues, in fact)
<elmo> wgrant: https:/
This happened again over the weekend.