Comment 2 for bug 1206629

Revision history for this message
Jason Stephenson (jstephenson) wrote :

It also seems to fail in a very interesting manner. In several cases, I have seen the storage listener die, with drones still running.

osrf_control --diagnostic says: ERR open-ils.storage Has PID file entry [1625], which matches no running open-ils.storage processes

And pgrep -af storage shows 7 drones running:
21046 OpenSRF Drone [open-ils.storage]
21052 OpenSRF Drone [open-ils.storage]
21059 OpenSRF Drone [open-ils.storage]
21063 OpenSRF Drone [open-ils.storage]
21064 OpenSRF Drone [open-ils.storage]
21070 OpenSRF Drone [open-ils.storage]
21077 OpenSRF Drone [open-ils.storage]

The fine generator itself is still running. pgrep -af fine reports:
21026 /usr/bin/perl /openils/bin/fine_generator.pl /openils/conf/opensrf_core.xml

The lock file is still in place.

My latest occurrence of this happened today after the 5:00 pm run. When the 6:00 pm run started, it reported that the fine generator seemed to already be running, so I had a look.

/openils/var/log/open-ils.storage_stderr.log has 15,193 of the following message, with the file timestamped at Mar 12 17:13:

Caught error from 'run' method: Can't call method "search_where" on an undefined value at /usr/local/share/perl/5.26.1/OpenILS/Application/Storage/Publisher/action.pm line 1014.

I'm grepping syslog to see if I can get more information but it takes a while to grep a 1.2GB file, so I'll have to update this bug, again, later.

I have seen this with OpenSRF and Evergreen 3.0 running on Ubuntu 16.04.

NOTE: I am not 100% certain that I am seeing a timeout, yet. That's what I'm searching for in the syslog file, but this looked like the most likely bug to glom onto.