Perl listener can fail to clean up children during post-crash restart

Bug #1953057 reported by Galen Charlton
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenSRF
New
Medium
Unassigned

Bug Description

If a Perl listener throws an uncaught exception (e.g., like bug 1953044), OpenSRF::System->run_service() will try to clean up and reset the service. This cleanup includes terminating and reaping all children, then spawning new ones.

However, the reaping process doesn't quite work: OpenSRF::Server->cleanup() sets the CHLD signal handler to 'IGNORE', which means that Linux will automatically reap the child... meaning that ->reap_children() will not detect that they are gone and will leave them on the active and/or idle lists. When the service resets itself, it will think that it already has one or more drones ready to go, which in turn means that it may artificially hit max_children sooner than it needs to.

OpenSRF 3.1+

Tags: pullrequest
Revision history for this message
Galen Charlton (gmc) wrote :

A patch is available at the tip of

working/user/gmcharlt/lp1953057_forcibly_reap_after_reset / https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/gmcharlt/lp1953057_forcibly_reap_after_reset

tags: added: pullrequest
Galen Charlton (gmc)
Changed in opensrf:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.