"Could not launch a new child" errors can fill logs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenSRF |
New
|
Undecided
|
Unassigned |
Bug Description
When a C service runs out of listeners (max_children is reached), additional requests will result in a "Could not launch a new child" warning being logged continuously until a listener becomes available and the request is processed. Depending on how long drone saturation lasts, you can end up with millions of these log messages in a very short period of time. I've seen this use up all available disk space on an Evergreen server.
When I mentioned this problem in the #evergreen IRC channel on May 13, Bill Erickson quickly put together a fix:
If I understand correctly, this branch adds a 1-second delay before attempting to re-process a request, which is how Perl handles the same scenario. In real-world conditions, that should slow things down long enough for the request backlog to get cleared up before the log spew consumes your disk space. It resolved our immediate problem and I haven't found any new issues yet after applying the fix.
New branch pushed w/ proper labeling:
https:/ /git.evergreen- ils.org/ ?p=working/ OpenSRF. git;a=shortlog; h=refs/ heads/user/ berick/ lp1881001- c-backlog- speedbump
The log spewing issue just happened on one of my test servers. Deploying the patch there for additional testing.