Activity log for bug #1704396

Date Who What changed Old value New value Message
2017-07-14 13:33:18 Kathy Lussier bug added bug
2017-07-14 13:35:57 Kathy Lussier description We've been investigating slowness/timeouts that have been seen since NOBLE upgraded from 2.10 to 2.12 and that I've recently seen while performing searches on test searches. We've seen the following behavior: - Slowness and timeouts at NOBLE when users perform one-hit searches. We mostly hear of this problem occurring with ISBN searches, but I think that's because those are the searches that are most likely to retrieve just one result. NOBLE is configured to automatically go to the record on one-hit searches, which I believe may be a factor in this slowness. - Metarecord searches on my test servers loaded with master frequently hang for over a minute before retrieving a set of search results. In testing, we've found this happening on NOBLE's 2.12 test servers too. While investigating the issue with the metarecord searches, we found that whenever a metarecord search hangs, there is always a record in the list that is not part of a group and exhibits the behavior we saw in bug 1694497. There is no record ID in the link to the record. It appears as if the system timed out while trying to retrieve the record ID for that result. After performing one of these metarecord searches, we took the ISBN for the record that was causing the problem for the metarecord search, and did a keyword search just for that ISBN. That search timed out, leading me to believe that the two different instances of slowness may be related. We turned on DEBUG_TIMING and found the following: - For the metarecord searches, we see a lot of "Firing off the multiclass query" and "Returned from the multiclass query" messages for each metarecord that's retrieved on the first page of results. When we had a slow metarecord search, we would see a 70 second lag between one of those firing off and returned messages. It was always 70 or 71 seconds (we saw similar 70 second pauses in the logs). This bottleneck always appeared where we would have been retrieving the result that had no record ID in the link. - For the ISBN searches we've been able to catch, we also see that 70 second lag. In one case, I saw it in the exact same spot. There was 70 seconds between the one instance of "Firing off the multiclass query" and "Returned from the multiclass query." In another instance, I saw that 70 second lag happen at past peer bib id retrieval. It's generally hard to predict when this slowdown will happen. A specific search may cause it during one trial, then be fine for another trial, and then be problematic on a third try. Also, if I see slowness both times when performing the same search twice, it isn't always the same result that exhibits the problem with the missing record id. It seems fairly random, but, if I perform enough metarecord searches on my test system, I can eventually make the problem happen. If there are other logs I can post here to help diagnose the problem, let me know. I have looked at the postgres logs, but I'm not seeing any long-running queries in there. We've been investigating slowness/timeouts that have been seen since NOBLE upgraded from 2.10 to 2.12 and that I've recently seen while performing searches on test servers. We've seen the following behavior: - Slowness and timeouts at NOBLE when users perform one-hit searches. We mostly hear of this problem occurring with ISBN searches, but I think that's because those are the searches that are most likely to retrieve just one result. NOBLE is configured to automatically go to the record on one-hit searches, which I believe may be a factor in this slowness. - Metarecord searches on my test servers loaded with master frequently hang for over a minute before retrieving a set of search results. In testing, we've found this happening on NOBLE's 2.12 test servers too. While investigating the issue with the metarecord searches, we found that whenever a metarecord search hangs, there is always a record in the list that is not part of a group and exhibits the behavior we saw in bug 1694497. There is no record ID in the link to the record. It appears as if the system timed out while trying to retrieve the record ID for that result. After performing one of these metarecord searches, we took the ISBN for the record that was causing the problem for the metarecord search, and did a keyword search just for that ISBN. That search timed out, leading me to believe that the two different instances of slowness may be related. We turned on DEBUG_TIMING and found the following: - For the metarecord searches, we see a lot of "Firing off the multiclass query" and "Returned from the multiclass query" messages for each metarecord that's retrieved on the first page of results. When we had a slow metarecord search, we would see a 70 second lag between one of those firing off and returned messages. It was always 70 or 71 seconds (we saw similar 70 second pauses in the logs). This bottleneck always appeared where we would have been retrieving the result that had no record ID in the link. - For the ISBN searches we've been able to catch, we also see that 70 second lag. In one case, I saw it in the exact same spot. There was 70 seconds between the one instance of "Firing off the multiclass query" and "Returned from the multiclass query." In another instance, I saw that 70 second lag happen at past peer bib id retrieval. It's generally hard to predict when this slowdown will happen. A specific search may be slow in one trial, then be fine for another trial, and then be problematic on a third try. Also, if I see slowness both times when performing the same search twice, it isn't always the same result that exhibits the problem with the missing record id. It seems fairly random, but, if I perform enough metarecord searches on my test system, I can eventually make the problem happen. If there are other logs I can post here to help diagnose the problem, let me know. I have looked at the postgres logs, but I'm not seeing any long-running queries in there.
2017-07-18 17:36:23 Michele Morgan attachment added glamour.txt https://bugs.launchpad.net/evergreen/+bug/1704396/+attachment/4916980/+files/glamour.txt
2017-07-19 13:59:13 Christine Morgan bug added subscriber Christine Morgan
2017-08-23 21:30:31 Mike Rylander attachment added opensrf-socket-print-error-logging.patch https://bugs.launchpad.net/evergreen/+bug/1704396/+attachment/4937715/+files/opensrf-socket-print-error-logging.patch
2017-08-24 13:47:33 Robert J Jackson bug added subscriber Robert J Jackson
2017-08-24 20:21:11 Kathy Lussier summary Slowness for metecord and one-hit searches in 2.12 Slowness for metacord and one-hit searches in 2.12
2017-08-24 20:21:19 Kathy Lussier summary Slowness for metacord and one-hit searches in 2.12 Slowness for metarecord and one-hit searches in 2.12
2017-08-29 23:29:53 Jeff Davis attachment added lp1704396-logging.diff https://bugs.launchpad.net/evergreen/+bug/1704396/+attachment/4941085/+files/lp1704396-logging.diff
2017-09-13 19:24:30 Michele Morgan attachment added Missing format info in metarecord search https://bugs.launchpad.net/evergreen/+bug/1704396/+attachment/4949670/+files/metarecord_missing_info.png
2017-10-09 13:42:22 Chris Sharp evergreen: status New Confirmed
2017-10-19 21:54:02 Jeff Davis attachment added lp1704396-more-logging.diff https://bugs.launchpad.net/evergreen/+bug/1704396/+attachment/4977057/+files/lp1704396-more-logging.diff
2017-11-08 18:42:03 Jason Stephenson evergreen: milestone 3.next
2017-11-08 18:42:21 Jason Stephenson nominated for series evergreen/3.0
2017-11-08 18:42:21 Jason Stephenson bug task added evergreen/3.0
2017-11-08 18:42:21 Jason Stephenson nominated for series evergreen/2.12
2017-11-08 18:42:21 Jason Stephenson bug task added evergreen/2.12
2017-11-08 18:42:33 Jason Stephenson evergreen/2.12: status New Confirmed
2017-11-08 18:42:39 Jason Stephenson evergreen/3.0: status New Opinion
2017-11-08 18:42:45 Jason Stephenson evergreen/3.0: status Opinion Confirmed
2017-11-08 18:42:50 Jason Stephenson evergreen/2.12: importance Undecided High
2017-11-08 18:42:56 Jason Stephenson evergreen/3.0: importance Undecided High
2017-11-08 18:43:07 Jason Stephenson evergreen/2.12: milestone 3.0.2
2017-11-08 18:43:13 Jason Stephenson evergreen/3.0: milestone 2.12.8
2017-11-08 18:49:35 Jason Stephenson evergreen: status Confirmed Fix Committed
2017-11-08 18:49:40 Jason Stephenson evergreen/2.12: status Confirmed Fix Committed
2017-11-08 18:49:46 Jason Stephenson evergreen/3.0: status Confirmed Fix Committed
2017-11-09 14:31:35 Galen Charlton evergreen: milestone 3.next
2017-11-09 14:31:41 Galen Charlton evergreen: milestone 3.0.2
2017-11-09 14:31:51 Galen Charlton bug task deleted evergreen/3.0
2017-11-09 14:31:56 Galen Charlton evergreen/2.12: milestone 3.0.2 2.12.8
2017-12-01 18:26:08 Evergreen Bug Maintenance evergreen: status Fix Committed Fix Released
2017-12-01 18:26:10 Evergreen Bug Maintenance evergreen/2.12: status Fix Committed Fix Released
2019-03-07 19:37:33 Elaine Hardy tags metarecords search
2019-03-07 19:52:03 Robert J Jackson removed subscriber Robert J Jackson