Potential race condition in oils_ctl.sh and restart_sip command

Bug #1945858 reported by Jason Stephenson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
New
Undecided
Unassigned

Bug Description

We run a oils_ctl.sh every morning at 2:05 am to restart our SIPServer processes. This serves as a way to make sure that new accounts and configuration changes are picked up in a timely manner without disrupting library service.

There appears to have been a race condition in the restart on one of our servers this morning. Below are log entries that show the SIPServer handling a request at 2:04:59, the cron job running at 2:05:01, the SIPServer supposedly shutting down, and the new process failing to start because the port is still in use. I am not certain how much of a role the request at 2:04:59 plays in this, but I suspect that it does play a role because this sequence of events almost never happens.

Oct 2 02:04:59 sip3 SIPServer.pm: read_SIP_packet, INPUT MSG: '9900002.00'
Oct 2 02:04:59 sip3 SIPServer.pm: read_SIP_packet, INPUT MSG: '9900002.00'
Oct 2 02:04:59 sip3 SIPServer.pm: INPUT MSG: '9900002.00'
Oct 2 02:04:59 sip3 SIPServer.pm: Sip::MsgType::new('Sip::MsgType', '9900002.00', '99'): msgtag '0'
Oct 2 02:04:59 sip3 SIPServer.pm: Sip::MsgType::_initialize('SC Status', '00002.00...')
Oct 2 02:04:59 sip3 SIPServer.pm: Sip::MsgType::_initialize('SC Status', '00002.00', 'CA3A4', '8', ...
Oct 2 02:04:59 sip3 SIPServer.pm: OUTPUT MSG: '98YYYYYN00000320211002 0204592.00AOsrcheckin02|BXYYYYYYYYYYYYYYYY|'
Oct 2 02:04:59 sip3 SIPServer.pm: SIP processing duration 0.000 : 9900002.00
Oct 2 02:05:01 sip3 CRON[23371]: (opensrf) CMD ($EG_BIN_DIR/oils_ctl.sh -s $SIP_CONF -d $PID_DIR -a restart_sip > /dev/null)
Oct 2 02:05:01 sip3 acs-server[10418]: 2021/10/02-02:05:01 Server closing!
Oct 2 02:05:02 sip3 acs-server[23377]: 2021/10/02-02:05:02 SIPServer (type Net::Server::PreFork) starting! pid(23377)
Oct 2 02:05:02 sip3 acs-server[23377]: Resolved [*]:6001 to [::]:6001, IPv6
Oct 2 02:05:02 sip3 acs-server[23377]: Not including resolved host [0.0.0.0] IPv4 because it will be handled by [::] IPv6
Oct 2 02:05:02 sip3 acs-server[23377]: Binding to TCP port 6001 on host :: with IPv6
Oct 2 02:05:02 sip3 acs-server[23377]: 2021/10/02-02:05:02 Can't connect to TCP port 6001 on :: [Address already in use]#012 at line 64 in file /usr/share/perl5/Net/Server/Proto/TCP.pm
Oct 2 02:05:02 sip3 acs-server[23377]: 2021/10/02-02:05:02 Server closing!

summary: - Potential race condition in oils_ctl.sh and restar_sip command
+ Potential race condition in oils_ctl.sh and restart_sip command
description: updated
tags: added: performance sip
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.