Timeouts handled differently when restarting a service vs all services
Bug #1930578 reported by
Jason Boyer
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenSRF |
Fix Released
|
Low
|
Unassigned |
Bug Description
OpenSRF 3.2.1
When restarting services and routers via osrf_control (opensrf-perl.pl) you may see that this or that service took too long to respond to a signal and osrf_control gets more aggressive, sending TERM, then INT, and finally KILL if things are just plain wedged. When restarting a single service this does not happen; you'll be told that the TERM didn't take and then comes a message that the service in question is already running.
Looking at opensrf-perl.pl.in there appears to almost be parallel implementations for dealing with single services vs all for some reason. If both branches stopped their services the same way this mismatch could be avoided.
Changed in opensrf: | |
milestone: | 3.2.2 → none |
Changed in opensrf: | |
status: | New → Confirmed |
milestone: | none → 3.2.3 |
Changed in opensrf: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
On looking into things more closely one (potential) benefit of the current setup is that when signaling all services it can be done a little more in parallel than if everything eventually distilled down into the same couple of functions doing the work. So rather than a deep scrub refactoring, here's a small change that brings the signal escalation for single services in line with the *-all versions: https:/ /git.evergreen- ils.org/ ?p=working/ OpenSRF. git;a=shortlog; h=refs/ heads/user/ jboyer/ lp1930578_ that_escalated_ quickly / working/ user/jboyer/ lp1930578_ that_escalated_ quickly