Z39.50 Import errors in staff client

Bug #1271559 reported by Tony Bandy
36
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Evergreen
Incomplete
Medium
Unassigned

Bug Description

When using the Z39.50 import function in the Evergreen staff client (2.4.2), our libraries have reported various errors when attempting to search for records from external sites. There seems to be no rhyme or reason on the errors, but sometimes unchecking all of the resources and then adding them back one at a time helps to alleviate issue (?) When this happens, the error window appears in the client with language similar to this:

Network or server failure. Please check your Internet connection...and choose Retry Network. If you need to enter Offline Mode, choose Ignore Errors in this and subsequent dialogs. If you believe this error is due to a bug in Evergreen and not network problems, please contact your help desk or friendly Evergreen administrators, and give them this information:
method=open-ils.search.z3950.search_class

We're currently running Evergreen 2.42 on a hosted instance of Debian.

Tags: z3950
Revision history for this message
Liam Whalen (whalen-ld) wrote :

Sitka received a similar report from one of our libraries. Our network admin tracked the problem down, and I have a patch waiting to be tested.

You can see my commit here:

http://git.sitka.bclibraries.ca/gitweb/?p=sitka/evergreen.git;a=shortlog;h=refs/heads/user/lwhalen/RT18218_z3950_timeout

From what I can see, the problem results from an empty result set being returned when a server times out and the Perl code trying to call a function on an undefined variable.

I have not signed off on it yet because it has not been tested.

Liam Whalen (whalen-ld)
Changed in evergreen:
assignee: nobody → Liam Whalen (whalen-ld)
Ben Shum (bshum)
Changed in evergreen:
status: New → In Progress
importance: Undecided → Medium
Revision history for this message
Liam Whalen (whalen-ld) wrote :
tags: added: pullrequest
Revision history for this message
Liam Whalen (whalen-ld) wrote :

That commit is targeted at 2.4. I will integrate it with master and retag as pullrequest when that is done.

tags: removed: pullrequest
Revision history for this message
Jason Etheridge (phasefx) wrote :

I've been seeing similar symptoms in 2.6.3 and tried Liam's patch, but it did not help.

Network or server failure. Please check your Internet connection to catalog.sparkpa.org and choose Retry Network. If you need to enter Offline Mode, choose Ignore Errors in this and subsequent dialogs. If you believe this error is due to a bug in Evergreen and not network problems, please contact your help desk or friendly Evergreen administrators, and give them this information:
method=open-ils.search.z3950.search_class
params=["authtokenscrubbed",{"service_array":["accesspa","bpl","EGIN","loc"],"username_array":["","","",""],"password_array":["","","",""],"limit":3,"offset":0,"search":{"isbn":"9780547858197"},"service":["accesspa","bpl","EGIN","loc"],"username":["","","",""],"password":["","","",""]}]
THROWN:
{"fileName":"oils://remote/js/dojo/dojo/dojo.js","lineNumber":31}
STATUS:

So, the same query can work some times and not at others. Log activity is nearly identical between the two, but here's the part that changes:

WORKING QUERY:
z3950: connecting to server lx2.loc.gov:210:LCDB as
z3950: query => @attr 1=7 @attr 4=6 @attr 5=1 "9780547858197" z3950: search [@attr 1=7 @attr 4=6 @attr 5=1 "9780547858197" ] took 0 seconds
z3950: using record format 'F' and transmission format 'usmarc'
z3950: 'bpl' search returned 1 hits
z3950: fetching record 0
z3950: using record format 'F' and transmission format 'usmarc'
z3950: 'EGIN' search returned 0 hits
z3950: using record format 'F' and transmission format 'usmarc'
z3950: 'accesspa' search returned 1 hits
z3950: fetching record 0
z3950: using record format 'FI' and transmission format 'usmarc'
z3950: 'loc' search returned 1 hits
z3950: fetching record 0
Message processing duration: 6.254

BROKEN QUERY:
z3950: connecting to server lx2.loc.gov:210:LCDB as
z3950: query => @attr 1=7 @attr 4=6 @attr 5=1 "9780547858197"
z3950: search [@attr 1=7 @attr 4=6 @attr 5=1 "9780547858197" ] took 0 seconds
z3950: using record format 'F' and transmission format 'usmarc'
z3950: 'bpl' search returned 1 hits
z3950: fetching record 0
z3950: using record format 'F' and transmission format 'usmarc'
z3950: 'accesspa' search returned 1 hits
z3950: fetching record 0
z3950: using record format 'FI' and transmission format 'usmarc'
z3950: 'loc' search returned 1 hits
z3950: fetching record 0
z3950: using record format 'F' and transmission format 'usmarc'
z3950: 'EGIN' search returned 0 hits
Message processing duration: 30.335

That seems to match the criteria Liam mentioned, time out with an empty result set.

Liam Whalen (whalen-ld)
Changed in evergreen:
assignee: Liam Whalen (whalen-ld) → nobody
Revision history for this message
Linda Jansova (skolkova-s) wrote :

It seems that the error persists in 2.12.4 (a more detailed description has been sent to open-ils-general mailing list: http://libmail.georgialibraries.org/pipermail/open-ils-general/2017-August/014097.html)...

tags: added: z3950
Revision history for this message
Linda Jansova (skolkova-s) wrote :

The error mentioned on 2017-08-18 proved to be of a different origin and has been sorted out via ejabbered configuration - changing max_stanza_size (as described in bug https://bugs.launchpad.net/evergreen/+bug/1713324).

Revision history for this message
Chris Sharp (chrissharp123) wrote :

Can someone update whether this is still an issue in the web client?

Changed in evergreen:
status: In Progress → Incomplete
Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

I'm still seeing something similar to this in EG 3.11.1.

One strange thing is that it seems to happen pre authentication. I'm seeing every other call fail to open-ils.search.z3950.search_class, even with an invalid auth key. Although sometimes it is a few in a row.

srfsh# request open-ils.search open-ils.search.z3950.search_class "invalidauthkey",{ "service_array": [ "oclc" ] , "username_array": [ "" ] , "password_array": [ "" ] , "limit":10, "offset":0, "search": { "isbn":"9781639731930x" } , "service": [ "oclc" ] , "username": [ "" ] , "password": [ "" ]}

Received no data from server

------------------------------------
Request Completed Successfully
Request Time in seconds: 0.007930
------------------------------------
srfsh# request open-ils.search open-ils.search.z3950.search_class "invalidauthkey",{ "service_array": [ "oclc" ] , "username_array": [ "" ] , "password_array": [ "" ] , "limit":10, "offset":0, "search": { "isbn":"9781639731930x" } , "service": [ "oclc" ] , "username": [ "" ] , "password": [ "" ]}

Received Data: {
  "textcode":"NO_SESSION",
  "ilsevent":1001,
  "desc":"User login session has either timed out or does not exist",
  "pid":50146,
  "stacktrace":"oils_auth.c:1169"
}

------------------------------------
Request Completed Successfully
Request Time in seconds: 0.010717
------------------------------------

With a valid auth key.

srfsh# request open-ils.search open-ils.search.z3950.search_class "validauthkey",{ "service_array": [ "oclc" ] , "username_array": [ "" ] , "password_array": [ "" ] , "limit":10, "offset":0, "search": { "isbn":"9781639731930x" } , "service": [ "oclc" ] , "username": [ "" ] , "password": [ "" ]}

Received Data: {
  "count":0,
  "service":"oclc",
  "records":[

  ]
}

------------------------------------
Request Completed Successfully
Request Time in seconds: 0.234018
------------------------------------
srfsh# request open-ils.search open-ils.search.z3950.search_class "validauthkey",{ "service_array": [ "oclc" ] , "username_array": [ "" ] , "password_array": [ "" ] , "limit":10, "offset":0, "search": { "isbn":"9781639731930x" } , "service": [ "oclc" ] , "username": [ "" ] , "password": [ "" ]}

Received no data from server

------------------------------------
Request Completed Successfully
Request Time in seconds: 0.006879
------------------------------------

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

I ran tcpdump to watch the traffic going to oclc, and when the "Received no data from server" response is returned there is no traffic sent or received from the oclc z39.50 server.

So it doesn't seem like a timeout since nothing was sent.

I've added some extra logging to try and see what is going on, and it seems like there is a failure to look up the server info in the case of "Received no data".

[2023-11-27 10:03:53] open-ils.search [INFO:47634:Application.pm:159:17000754433630845] CALL: open-ils.search open-ils.search.z3950.search_class e3ec83033e6f64bb4e2cc1b448dd51e8, HASH(0x55f9c46a23f0)
[2023-11-27 10:03:53] open-ils.search [INFO:47634:Z3950.pm:260:17000754433630845] z3950: service: oclc, async: 1
[2023-11-27 10:03:53] open-ils.search [INFO:47634:Z3950.pm:270:17000754433630845] z3950: non native search
[2023-11-27 10:03:53] open-ils.search [INFO:47634:Z3950.pm:334:17000754433630845] z3950: do_search... for oclc
[2023-11-27 10:03:53] open-ils.search [WARN:47634:Application.pm:624:17000754433630845] open-ils.search.z3950.search_class: Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.36.0/OpenILS/Application/Search/Z3950.pm line 335.
[2023-11-27 10:03:53] open-ils.search [WARN:47634:Application.pm:624:17000754433630845] open-ils.search.z3950.search_class: Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.36.0/OpenILS/Application/Search/Z3950.pm line 335.
[2023-11-27 10:03:53] open-ils.search [WARN:47634:Application.pm:624:17000754433630845] open-ils.search.z3950.search_class: Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.36.0/OpenILS/Application/Search/Z3950.pm line 335.
[2023-11-27 10:03:53] open-ils.search [INFO:47634:Z3950.pm:335:17000754433630845] z3950: Host: , Port: , DB:
[2023-11-27 10:03:53] open-ils.search [INFO:47634:Z3950.pm:336:17000754433630845] z3950: Query:
[2023-11-27 10:03:53] open-ils.search [WARN:47634:Application.pm:624:17000754433630845] open-ils.search.z3950.search_class: Use of uninitialized value $results[-1] in concatenation (.) or string at /usr/local/share/perl/5.36.0/OpenILS/Application/Search/Z3950.pm line 285.
[2023-11-27 10:03:53] open-ils.search [WARN:47634:Application.pm:624:17000754433630845] open-ils.search.z3950.search_class: Use of uninitialized value $connections[-1] in concatenation (.) or string at /usr/local/share/perl/5.36.0/OpenILS/Application/Search/Z3950.pm line 285.
[2023-11-27 10:03:53] open-ils.search [INFO:47634:Z3950.pm:285:17000754433630845] z3950: Result object: , Connection object:
[2023-11-27 10:03:53] open-ils.search [INFO:47634:Z3950.pm:291:17000754433630845] z3950: Connections created

So maybe it is related to the caching of the service connection details in %services?

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

Sorry, stream of consciousness debugging incoming.

%services caches the allowed z39.50 service settings, and I'm assuming it sticks around between calls since the service keeps running?

It scopes the services to what the current auth user can access... but what happens when a different user with different z39.50 server scoping connects to the same open-ils.search service. And they don't have access to the same z39.50 services.

I suspect that is what I'm seeing, do_service_search is trying to get the host, port and db from an entry that doesn't exist for the user that created the %services cached info.

The sub do_search shorts out with a few 'or return undef;" calls if the host, port or db are not set.

So to trigger this, you need to have different users with different z39.50 services that they are allowed to use.

I think the initial calls to open-ils.search query_services cause the cached %services to be scoped to a particular user, since it calls fetch_service_defs($e) with an editor using the called auth token.

So an easy fix might be to not cache the %services in that case.

I bet this showed up when the use_perm for config.z3950_source was added.
https://gitlab.com/LARL/evergreen-larl/-/commit/6b900e1b5da091c09d0a4f668ca8618adfb4a260

One other thing I noticed, probably not actually a problem though, the do_class_search defines @services, which seems confusing when there is a global %services.

Josh

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

Steps to reproduce issue.

1. Have 2 or more users with different Z39.50 use_perm permissions, that are limited to different sets of z39.50 servers.

2. Grab or generate auth keys for those users.

3. Restart open-ils.search to clear the %services cache : osrf_control --restart --service open-ils.search -l

4. Watch your system logs for z3950 requests. tail -f /openils/var/log/osrfsys.log| fgrep 'z3950'

5. start srfsh and perform a request for user 1 to get the list of z39.50 servers they can access.
   a. request open-ils.search open-ils.search.z3950.retrieve_services "user 1 auth key"
   b. Make note of which process handled the request from logs.
   c. Process 50493 handled this request for example open-ils.search [INFO:50493:Z3950.pm:454:170110717350926105] z3950:

6. Perform the request again for user 2. Hopefully you hit a different process. If you hit the same process you will need to go back and request for user 1 again.

7. Perform some searches and watch to see which process you hit. If you are searching using User2, and you hit the process that has the services for user1, then you will get a "Received no data from server" result.

Search example: Set your service name to match what you have.

srfsh# request open-ils.search open-ils.search.z3950.search_class "user 2 auth token",{ "service_array": [ "setservicename" ] , "username_array": [ "" ] , "password_array": [ "" ] , "limit":10, "offset":0, "search": { "isbn":"9781639731930y" } , "service": [ "setservicename" ] , "username": [ "" ] , "password": [ "" ]}

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote (last edit ):

After re-reading this bug I think I'm describing a different bug. We are not seeing any interface errors, just the lack of results when there should be results. So I'll create a new bug report.

Bug 2044854 created for the issue I noticed
Josh

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.