Perl services can crash with a "Can't kill a non-numeric process ID" error

Bug #1953047 reported by Galen Charlton
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenSRF
New
Medium
Unassigned

Bug Description

Perl app listeners can occasionally throw the following exception:

server: died with error Can't kill a non-numeric process ID at /opt/sequoia/apps/sequoia-perl5lib/lib/perl5/OpenSRF/Server.pm line 335.

As with bug 1953044, when this happens, the listener will kill its drones and attempt to reset itself (though the reset doesn't work for other reasons that I'll document in a separate bug).

We have seen this on Perl 5.20.2 and Perl 5.28.1 systems, though I'm not sure that the Perl version matters.

OpenSRF 3.1+

Tags: pullrequest
Galen Charlton (gmc)
Changed in opensrf:
importance: Undecided → Medium
Revision history for this message
Galen Charlton (gmc) wrote :

A patch is available at the tip of

user/gmcharlt/lp1953047_only_kill_numeric_pids / https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/gmcharlt/lp1953047_only_kill_numeric_pids

As the commit message indicates, this is just a band-aid, but one that may help us track down the root cause.

Revision history for this message
Galen Charlton (gmc) wrote :

Noting that the non-handling of failed forks by Perl apps mentioned in the commit message is the subject of bug 1546683

Revision history for this message
Galen Charlton (gmc) wrote :

I just saw one scenario that triggered the error: doing a service restart where the listener didn't go away after a TERM and needed an INT:

* timed out waiting on open-ils.actor pid=23389 to die
* sending INT signal to pid=23389 open-ils.actor

Galen Charlton (gmc)
tags: added: pullrequest
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.