Activity log for bug #1819798

Date Who What changed Old value New value Message
2019-03-13 00:36:20 Jason Stephenson bug added bug
2019-03-13 00:50:04 Jason Stephenson description Evergreen version: 3.0 OpenSRF Version: 3.0 and 3.1/master PostgreSQL version: 9.5.16 O/S Version: Ubuntu 16.04 & Ubuntu 18.04 In several cases, I have seen the storage listener die, with drones still running, while using a parallel setting of 6 for the fine generator. osrf_control --diagnostic says: ERR open-ils.storage Has PID file entry [1625], which matches no running open-ils.storage processes And pgrep -af storage shows 7 drones running: 21046 OpenSRF Drone [open-ils.storage] 21052 OpenSRF Drone [open-ils.storage] 21059 OpenSRF Drone [open-ils.storage] 21063 OpenSRF Drone [open-ils.storage] 21064 OpenSRF Drone [open-ils.storage] 21070 OpenSRF Drone [open-ils.storage] 21077 OpenSRF Drone [open-ils.storage] The fine generator itself is still running. pgrep -af fine reports: 21026 /usr/bin/perl /openils/bin/fine_generator.pl /openils/conf/opensrf_core.xml The lock file is still in place. My latest occurrence of this happened today after the 5:00 pm run. When the 6:00 pm run started, it reported that the fine generator seemed to already be running, so I had a look. Looking through the 50MB of sylog that exist on this one server for the hour of 17:00, I see no signs of errors or other problems. I've checked kern.log for OOM Killer events, and nothing there, either. This is not the first time that this has happened. It was a fairly frequent occurrence after we "improved" our database configuration until I moved the fine generator and some other cron jobs to a separate VM from the main utility server. I also seem to recall seeing this happen in the past, prior to the database changes, but I don't have very good documentation of the past events. Evergreen version: 3.0 OpenSRF Version: 3.0 and 3.1/master PostgreSQL version: 9.5.16 O/S Version: Ubuntu 16.04 & Ubuntu 18.04 In several cases, I have seen the storage listener die, with drones still running, while using a parallel setting of 6 for the fine generator. osrf_control --diagnostic says: ERR open-ils.storage Has PID file entry [1625], which matches no running open-ils.storage processes And pgrep -af storage shows 7 drones running: 21046 OpenSRF Drone [open-ils.storage] 21052 OpenSRF Drone [open-ils.storage] 21059 OpenSRF Drone [open-ils.storage] 21063 OpenSRF Drone [open-ils.storage] 21064 OpenSRF Drone [open-ils.storage] 21070 OpenSRF Drone [open-ils.storage] 21077 OpenSRF Drone [open-ils.storage] The fine generator itself is still running. pgrep -af fine reports: 21026 /usr/bin/perl /openils/bin/fine_generator.pl /openils/conf/opensrf_core.xml The lock file is still in place. My latest occurrence of this happened today after the 5:00 pm run. When the 6:00 pm run started, it reported that the fine generator seemed to already be running, so I had a look. Looking through the 50MB of syslog that exist on this one server for the hour of 17:00, I see no signs of errors or other problems. I've checked kern.log for OOM Killer events, and nothing there, either. However, the Evergreen syslog entries just stop at 17:15:56. A "normal" syslog has double the number of lines with log entries ending at 19:38:16. In the case of both logged hours, the fine generator was the only Evergreen process running. This is not the first time that this has happened. It was a fairly frequent occurrence after we "improved" our database configuration until I moved the fine generator and some other cron jobs to a separate VM from the main utility server. I also seem to recall seeing this happen in the past, prior to the database changes, but I don't have very good documentation of the past events.
2021-10-27 22:21:30 Terran McCanna tags performance