Comment 2 for bug 1359762

Revision history for this message
Jason Stephenson (jstephenson) wrote :

We actually get lots of these from time to time. Usually all within the same period of time, i.e. for several hours in the morning, afternoon, or evening, then they stop happening for a day or two.

While looking into this, I think I have found the cause of the Internal server errors.

It appears that we have cstore drones that are disconnecting from ejabberd and possibly shutting down, but the cstore listener and/or routers are unaware of this. As evidence, I present the following:

On the 4th, one of our central site staff got this exact error with a search. I verified that the error occurred in the Apache logs and checked it was the same error message. We then began digging through the logs and found some interesting information.

At the time the error occurred, the following message appeared in the osrfsys.log:

2014-09-04 10:05:13] /usr/sbin/apache2 [ERR :20351:EX.pm:66:1409804114203512907] Exception: OpenSRF::EX::Session 2014-09-04T10:05:13 OpenSRF::Transport /usr/local/share/perl/5.14.2/OpenSRF/Transport.pm:83 Session Error: <email address hidden>/open-ils.cstore_drone_evergreen_1409806682.243033_21093 IS NOT CONNECTED TO THE NETWORK!!!

So, I scoured the orsrfsy.log and backups for that drone and saw an interesting pattern revealed in the attachment.

That drone came online at approximately 1:00 am and was disconnected by 1:53 am and OpenSRF kept trying to send it messages for nine more hours!

Checking the ejabberd log, we discovered that the drone actually came online at 00:58 EDT and disconnected at 01:16:23 EDT.

We do not, yet, know the reason for the disconnection, nor if the drone was simply disconnected from jabber but still running, or if the drone was completely gone.