Internal Server Error Doing Keyword Basic Search for ISBN

Bug #1359762 reported by Jason Stephenson on 2014-08-21
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Undecided
Unassigned

Bug Description

Evergreen version: 2.7.0-alpha && master
OpenSRF version: master as 20140723
PostgreSQL version: 9.1

Normally, I would not open a Launchpad bug on something like this until I had more information, but I'm putting this up here in case others have seen similar occurrences. This isn't the only report of this happening in our Evergreen installation, but this is our best documented case.

We received the following report from one of our member libraries:

I got an internal server error at 2:11 pm doing a keyword search for isbn 9780062312426

 We get similar reports with the numeric search as well, but I don't have the information for one of those that I do for this instance.

The user says that the search worked the second time. I found the following in the osrfsys.log:

opensrf@evergreen:~$ grep 9780062312426 /openils/var/log/osrfsys.log
[2014-08-13 14:11:33] /usr/sbin/apache2 [INFO:28780:Search.pm:186:1407903305287801416] tpac: site=MVLC, depth=0, query=pref_ou(MRO) 9780062312426 depth(0)
[2014-08-13 14:11:33] /usr/sbin/apache2 [ACT:28780:Search.pm:431:1407903305287801419] EGWeb: [bre search] pref_ou(MRO) 9780062312426 depth(0)
[2014-08-13 14:11:33] open-ils.search [INFO:29826:Application.pm:152:1407903305287801420] CALL: open-ils.search open-ils.search.biblio.multiclass.query.staff HASH(0x3f096d8), pref_ou(MRO) 9780062312426 depth(0), 1
[2014-08-13 14:11:33] open-ils.search [INFO:29826:Biblio.pm:964:1407903305287801420] compiled search is {"estimation_strategy":"inclusion","skip_check":0,"limit":"50","check_limit":"1000","core_limit":10000,"offset":0,"searches":{"keyword":{"term":"pref_ou(MRO) 9780062312426 depth(0)"}}}
[2014-08-13 14:11:35] /usr/sbin/apache2 [INFO:28006:Search.pm:186:1407903305280061948] tpac: site=MVLC, depth=0, query=pref_ou(MRO) 9780062312426 depth(0)
[2014-08-13 14:11:35] /usr/sbin/apache2 [ACT:28006:Search.pm:431:1407903305280061950] EGWeb: [bre search] pref_ou(MRO) 9780062312426 depth(0)
[2014-08-13 14:11:35] open-ils.search [INFO:29826:Application.pm:152:1407903305280061951] CALL: open-ils.search open-ils.search.biblio.multiclass.query.staff HASH(0x3d17528), pref_ou(MRO) 9780062312426 depth(0), 1
[2014-08-13 14:11:35] open-ils.search [INFO:29826:Biblio.pm:964:1407903305280061951] compiled search is {"estimation_strategy":"inclusion","limit":51,"check_limit":"1000","core_limit":10000,"offset":0,"searches":{"keyword":{"term":"pref_ou(MRO) 9780062312426 depth(0)"}}}
[2014-08-13 14:11:53] /usr/sbin/apache2 [INFO:30055:Search.pm:186:14079033053005535] tpac: site=MVLC, depth=0, query=pref_ou(MRO) 9780062312426 depth(0)
[2014-08-13 14:11:53] /usr/sbin/apache2 [ACT:30055:Search.pm:431:14079033053005538] EGWeb: [bre search] pref_ou(MRO) 9780062312426 depth(0)
[2014-08-13 14:11:53] open-ils.search [INFO:29826:Application.pm:152:14079033053005539] CALL: open-ils.search open-ils.search.biblio.multiclass.query.staff HASH(0x3d45e08), pref_ou(MRO) 9780062312426 depth(0), 1
[2014-08-13 14:11:53] open-ils.search [INFO:29826:Biblio.pm:964:14079033053005539] compiled search is {"estimation_strategy":"inclusion","limit":"50","check_limit":"1000","core_limit":10000,"offset":0,"searches":{"keyword":{"term":"pref_ou(MRO) 9780062312426 depth(0)"}}}
[2014-08-13 14:11:55] /usr/sbin/apache2 [INFO:27325:Search.pm:186:1407903305273252170] tpac: site=MVLC, depth=0, query=pref_ou(MRO) 9780062312426 depth(0)
[2014-08-13 14:11:55] /usr/sbin/apache2 [ACT:27325:Search.pm:431:1407903305273252172] EGWeb: [bre search] pref_ou(MRO) 9780062312426 depth(0)
[2014-08-13 14:11:55] open-ils.search [INFO:29955:Application.pm:152:1407903305273252173] CALL: open-ils.search open-ils.search.biblio.multiclass.query.staff HASH(0x3d99918), pref_ou(MRO) 9780062312426 depth(0), 1
[2014-08-13 14:11:55] open-ils.search [INFO:29955:Biblio.pm:964:1407903305273252173] compiled search is {"estimation_strategy":"inclusion","limit":51,"check_limit":"1000","core_limit":10000,"offset":0,"searches":{"keyword":{"term":"pref_ou(MRO) 9780062312426 depth(0)"}}}
[2014-08-13 14:14:11] /usr/sbin/apache2 [INFO:28755:Search.pm:186:1407903305287551535] tpac: site=MVLC, depth=0, query=pref_ou(MRO) 9780062312426 depth(0)
[2014-08-13 14:14:11] /usr/sbin/apache2 [ACT:28755:Search.pm:431:1407903305287551537] EGWeb: [bre search] pref_ou(MRO) 9780062312426 depth(0)
[2014-08-13 14:14:11] open-ils.search [INFO:30047:Application.pm:152:1407903305287551538] CALL: open-ils.search open-ils.search.biblio.multiclass.query.staff HASH(0x38afc60), pref_ou(MRO) 9780062312426 depth(0), 1
[2014-08-13 14:14:11] open-ils.search [INFO:30047:Biblio.pm:964:1407903305287551538] compiled search is {"estimation_strategy":"inclusion","limit":51,"check_limit":"1000","core_limit":10000,"offset":0,"searches":{"keyword":{"term":"pref_ou(MRO) 9780062312426 depth(0)"}}}

The above makes it appear that the search was done 5 times between 14:11 and 14:14, but the users insists that the they only did it twice.

I was able to find some non-timestamped entries in the Apache error log in between other entries happening at the time of the searches:

14:11:33:

Use of uninitialized value $_ in list assignment at /usr/local/share/perl/5.14.2/OpenILS/WWW/EGCatLoader/Search.pm line 812.
Use of uninitialized value $query in hash element at /usr/local/share/perl/5.14.2/OpenILS/WWW/EGCatLoader/Search.pm line 814.

Jason Stephenson (jstephenson) wrote :

I got another report of this problem, and it looks to be more generic than just ISBNs. I was able to extract the following from our Apache access and error logs:

173.13.87.85 - - [30/Aug/2014:10:59:16 -0400] "GET /eg/opac/results?contains=contains&_special=1&qtype=identifier%7Cisbn&query=9781455829132&locg=1 HTTP/1.1" 500 1003 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120713 open_ils_staff_client/2.6.0mvlc.1"
[Sat Aug 30 10:59:17 2014] [error] [client 173.13.87.85] egweb: Context Loader error: Can't use an undefined value as an ARRAY reference at /usr/local/share/perl/5.14.2/OpenILS/WWW/EGCatLoader/Util.pm line 73.\n

and

173.13.87.85 - - [30/Aug/2014:12:25:55 -0400] "GET /eg/opac/results?query=pref_ou%28CHELMSFORD%29+pref_ou%28CHELMSFORD%29+title%3Aentertainment+weekly++august+&qtype=keyword&fi%3Asearch_format=&locg=18&_adv=1&page=0 HTTP/1.1" 500 1003 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120713 open_ils_staff_client/2.6.0mvlc.1"
[Sat Aug 30 12:25:56 2014] [error] [client 173.13.87.85] egweb: Context Loader error: Can't use an undefined value as an ARRAY reference at /usr/local/share/perl/5.14.2/OpenILS/WWW/EGCatLoader/Util.pm line 73.\n

The first is an identifier search for ISBN and the second a keyword search, qualified by title for a magazine. These are the most commonly reported searches to have issues on our server.

I'll keep digging and update this ticket if I find out anything new.

Jason Stephenson (jstephenson) wrote :

We actually get lots of these from time to time. Usually all within the same period of time, i.e. for several hours in the morning, afternoon, or evening, then they stop happening for a day or two.

While looking into this, I think I have found the cause of the Internal server errors.

It appears that we have cstore drones that are disconnecting from ejabberd and possibly shutting down, but the cstore listener and/or routers are unaware of this. As evidence, I present the following:

On the 4th, one of our central site staff got this exact error with a search. I verified that the error occurred in the Apache logs and checked it was the same error message. We then began digging through the logs and found some interesting information.

At the time the error occurred, the following message appeared in the osrfsys.log:

2014-09-04 10:05:13] /usr/sbin/apache2 [ERR :20351:EX.pm:66:1409804114203512907] Exception: OpenSRF::EX::Session 2014-09-04T10:05:13 OpenSRF::Transport /usr/local/share/perl/5.14.2/OpenSRF/Transport.pm:83 Session Error: <email address hidden>/open-ils.cstore_drone_evergreen_1409806682.243033_21093 IS NOT CONNECTED TO THE NETWORK!!!

So, I scoured the orsrfsy.log and backups for that drone and saw an interesting pattern revealed in the attachment.

That drone came online at approximately 1:00 am and was disconnected by 1:53 am and OpenSRF kept trying to send it messages for nine more hours!

Checking the ejabberd log, we discovered that the drone actually came online at 00:58 EDT and disconnected at 01:16:23 EDT.

We do not, yet, know the reason for the disconnection, nor if the drone was simply disconnected from jabber but still running, or if the drone was completely gone.

Jason Stephenson (jstephenson) wrote :
Download full text (3.8 KiB)

It is happening right now:

[2014-09-06 10:22:48] /usr/sbin/apache2 [ERR :24799:EX.pm:66:1409976904247991950] Exception: OpenSRF::EX::Session 2014-09-06T10:22:48 OpenILS::Utils::CStoreEditor /usr/local/share/perl/5.14.2/OpenILS/Utils/CStoreEditor.pm:453 Session Error: <email address hidden>/open-ils.cstore_drone_evergreen_1409986944.186285_27116 IS NOT CONNECTED TO THE NETWORK!!!

My understanding is the last bit of the drone id, the 27116 is the PID. When checking, I find no drone with that PID running:

opensrf@evergreen:~$ ps ax | grep open-ils.cstore
 2768 ? Ss 162:43 OpenSRF Listener [open-ils.cstore]
 3492 ? S 0:01 OpenSRF Drone [open-ils.cstore]
 3539 ? S 0:01 OpenSRF Drone [open-ils.cstore]
 3540 ? S 0:01 OpenSRF Drone [open-ils.cstore]
 3541 ? S 0:00 OpenSRF Drone [open-ils.cstore]
 3548 ? S 0:01 OpenSRF Drone [open-ils.cstore]
 3549 ? S 0:01 OpenSRF Drone [open-ils.cstore]
 3604 ? S 0:01 OpenSRF Drone [open-ils.cstore]
 3605 ? S 0:01 OpenSRF Drone [open-ils.cstore]
 3608 ? S 0:00 OpenSRF Drone [open-ils.cstore]
 3611 ? S 0:00 OpenSRF Drone [open-ils.cstore]
 3612 ? S 0:01 OpenSRF Drone [open-ils.cstore]
 3613 ? S 0:01 OpenSRF Drone [open-ils.cstore]
 3615 ? S 0:01 OpenSRF Drone [open-ils.cstore]
 3616 ? S 0:00 OpenSRF Drone [open-ils.cstore]
 3617 ? S 0:00 OpenSRF Drone [open-ils.cstore] ...

Read more...

Jason Stephenson (jstephenson) wrote :

As requested by Mike in IRC, I've attached everything I can find for trace 140980411420299132 in the logs.

Jason Stephenson (jstephenson) wrote :

Right now, we think this might have been related to hitting the ulimit for number of open files, but we're still watching to see.

Jason Stephenson (jstephenson) wrote :

Nope. It still seems to be happening. More digging is required.

Jason Stephenson (jstephenson) wrote :

This does not seem to be happening to us any longer, and it was never confirmed by another site.

I suspect it was just our old server being overloaded at times.

Changed in evergreen:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers