OpenSRF Perl Unable to Cleanup Idle Children

Bug #1987873 reported by Bill Erickson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenSRF
New
Undecided
Unassigned

Bug Description

OpenSRF 3.2

I recently repaired some local monitoring which tracks idle vs active drone counts and was surprised to see excess numbers of idle Perl drone processes.

I setup a test in stock Evergreen and configured open-ils.actor to have min idle = 1 and max idle = 3. Then I blasted it with a series of echo requests.

for x in $(seq 0 20); do ( echo "request open-ils.actor opensrf.system.echo true" | srfsh > /dev/null) & done; wait;

Once complete, there are 7 actor drones, all idle. With the current settings, it should have settled down to 3 idle children after a few seconds.

I believe the cause of the issue is this line of code in Server.pm:

https://git.evergreen-ils.org/?p=OpenSRF.git;a=blob;f=src/perl/lib/OpenSRF/Server.pm;h=52c53d244c63a91bc1364ca471b6a1ae09dfc0e9;hb=HEAD#l170

In short, if there's no message in the backlog, wait indefinitely for one from the network. This means we never reach the else block of the main "if ($msg) {}" block. The else block is where the idle maintenance occurs, though, so that code never runs.

===

The drones still process requests normally. The main concern here is excess memory use from having extra drones and improper reporting for tools that are monitoring active drone counts.

Revision history for this message
Bill Erickson (berick) wrote :

I'm testing this patch which just removes the offending line:

https://git.evergreen-ils.org/?p=working/OpenSRF.git;a=shortlog;h=refs/heads/user/berick/lp1987873-perl-idle-child-maint-fix

My proof of concept now behaves as expected. Other eyes would be appreciated though.

Revision history for this message
Bill Erickson (berick) wrote :

Note when testing I added some temporary logging to confirm the wait_time eventually returns to -1 when no more activity or child maintenance is needed. I did not test any message backlog scenarios, though.

Revision history for this message
Galen Charlton (gmc) wrote :

Took a quick look at this. I agree with removing the indefinite wait, but backlog processing will need to be tested, as with the proposed patch $from_network never changes value. Something like this might capture the intent better:

if ($msg) {
    # we just popped a message from the backlog queue
    # let's see if we can process it
    $from_network = 0;
} else {
    $msg = $self->{osrf_handle}->process($wait_time);
    if ($msg) $from_network = 1;
}

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.