librarian hangs for unknown reason

Bug #1675610 reported by Michael Foley
This bug report is a duplicate of:  Bug #1948711: librarian processes get wedged. Edit Remove
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Confirmed
Undecided
Unassigned

Bug Description

nagios alerted with: CHECK_NRPE: Socket timeout after 10 seconds.
a manual run of the check on acamar confirmed unable to connect.
restarting lauchpad_librarian3 resolved this, but did some debugging first and asked wgrant what to check. No obvious problems, nothing in logs (it had not logged anything for past 15 minutes). wgrant had me check file handles but lsof -p <pid> only reported 247 lines before restarting it.

Filing this report because as elmo points out we have seen this before:
<elmo> wgrant: so, it's not a single instance
<wgrant> orly
<wgrant> elmo: Sorry, it's the first I've heard of this sort of thing happening in a while (since the kernel issues, in fact)
<elmo> wgrant: https://pastebin.canonical.com/183608/ <-- obviously that's a naive grep, but from memory, I've seen other SREs restart the librarian

Revision history for this message
Haw Loeung (hloeung) wrote :

This happened again over the weekend.

Changed in launchpad:
status: New → Confirmed
Revision history for this message
Colin Watson (cjwatson) wrote :

Seems likely to be the same cause as bug 1948711, so duplicating.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.