Evergreen

Holds not displaying in web client Holds Shelf

Bug #1827250 reported by John Amundson on 2019-05-01

This bug affects 7 people

	Status	Importance	Assigned to	Milestone
Evergreen	Fix Released	Medium	Unassigned	Evergreen 3.3.5
3.2	Fix Released	Medium	Unassigned	Evergreen 3.2.10
3.4	Fix Released	Medium	Unassigned	Evergreen 3.4.1

Bug Description

Observed in EG 3.2.4

Since moving to Evergreen 3.2.4, we have had some instances where holds that should be appearing on a library's holds shelf (Circulation>Holds Shelf), are not displaying.

We can't determine what makes these holds special. No issues with the attached patron records, barcodes and call numbers not deleted, bib record not deleted, hold on the correct shelf, etc. There is also nothing consistent between the holds that are affected.

The issue is not with cache or a specific workstation. No matter where the holds shelf is pulled up, the same hold(s) fail to display.

The holds also behave correctly in other ways, (i.e. still displays as Ready for Pickup while on the patron's record, shows attached to the patron from Item Status, are fulfilled with checked out, etc).

This makes the problem hard to troubleshoot.

I'll note that we still have access to a XUL client component, and the holds that fail to display in the web client still display correctly in the XUL client.

We had no reports of this while on 3.0, but I cannot confirm one way or the other if older versions of the web client are also affected.

Tags:

Jason Stephenson (jstephenson) on 2019-05-01

Changed in evergreen:
status:	New → Confirmed
milestone:	none → 3.3.1

Revision history for this message

Jason Stephenson (jstephenson) wrote on 2019-05-01:

This appears to be caused by the last captured hold check in the new open-ils.storage.action.live_holds.wide_hash method. This method was introduced in Evergreen 3.2, so 3.1 should not be affected by this bug.

This code check attempts to find the most recently captured hold for the copy, but it fails if there are uncaptured holds that have the same current hold copy as the one being checked in. My experimentation on PostgreSQL 9.5 shows that these holds with a NULL capture_time sort before the holds with a capture_time in the below query:

SELECT id
FROM action.hold_request recheck
WHERE recheck.current_copy = cp.id
ORDER BY capture_time DESC
LIMIT 1

Thus returning a different hold from the one being put on the shelf. This id mismatch prevents the hold data from being returned to the client. This causes the hold not to show up in the holds shelf list.

The query above needs to have "AND capture_time IS NOT NULL" added to the WHERE clause. Branch forthcoming.

Jason Stephenson (jstephenson) on 2019-05-01

Changed in evergreen:
assignee:	nobody → Jason Stephenson (jstephenson)

Revision history for this message

Jason Stephenson (jstephenson) wrote on 2019-05-01:

My fix branch is here:

working/user/dyrcona/lp1827250_last_captured_hold_fix

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dyrcona/lp1827250_last_captured_hold_fix

Unfortunately, there is not a really good way to test this. It will most likely require production data as Concerto is likely to small and clean to turn the issue up.

If you have a hold shelf for a library that you know is missing holds in the list, then you could look at the list both before and after this patch and see that the missing holds turn up.

If pressed, I'l see if I can come up with a query to turn up the missing holds so you can know what to look for. The problem with that is the query used by the storage function to retrieve the hold shelf list is very complicated and replicating it in plain SQL would be no small feat.

tags:

added: pullrequest

Revision history for this message

Jason Stephenson (jstephenson) wrote on 2019-05-01:

Comment #1 is a case of, I should have edited my comment in a text editor instead of directly into the web form.

This isn't about putting holds on the shelf, but rather viewing the holds. I don't know why I was typing that about putting holds on the shelf. Suffice it to say, that the check for the last captured hold is preventing certain holds from show up on the holds shelf list in the 3.2+ web client and my patch appears to correct this in our testing.

Everything I said in comment #2 still stands. I'll blame the mess in comment #1 on my eagerness to explain the problem and thus rushing through typing up the comment.

Revision history for this message

Mike Rylander (mrylander) wrote on 2019-05-01:

Great catch, Jason.

It may be worth testing an alternate of

ORDER BY capture_time DESC NULLS LAST

rather than adding the IS NOT NULL test with regard to performance. Reason being, the plans may be different in the details, and use different indexes.

Revision history for this message

Jason Stephenson (jstephenson) wrote on 2019-05-02:

Results of EXPLAIN ANALYZE on both queries Edit (2.0 KiB, text/plain)

Thanks for the suggestion, Mike. I tried both variants with explain analyze, and the nulls last version is overall faster. (See the attached document with the results for a single copy.)

I made a new branch using nulls last on the order by and posted it to the working repo. It still resolves the issue for our example holds shelf where there are multiple copies missing. Since it about 1 ms faster than the is not null version, we should use this one:

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dyrcona/lp1827250_last_captured_hold_fix_nulls_last

Jason Stephenson (jstephenson) on 2019-05-02

Changed in evergreen:
assignee:	Jason Stephenson (jstephenson) → nobody

Revision history for this message

Jason Stephenson (jstephenson) wrote on 2019-05-21:

BTW, CW MARS is using the branch from comment #5 in production since the 14th of May 2019 with no adverse consequences. It is working for us as intended.

Evergreen Bug Maintenance (bugmaster) on 2019-05-22

Changed in evergreen:
milestone:	3.3.1 → 3.3.2

Evergreen Bug Maintenance (bugmaster) on 2019-07-03

Changed in evergreen:
milestone:	3.3.2 → 3.3.3

Michele Morgan (mmorgan) on 2019-07-24

Changed in evergreen:
assignee:	nobody → Michele Morgan (mmorgan)

Revision history for this message

Michele Morgan (mmorgan) wrote on 2019-07-24:

We've gotten reports of this on our 3.2.4 system. I'll test the patch, but here's a question:

Why do we retain current_copy in a cancelled hold?

The two examples I saw in our production system were situations where a cancelled hold existed with the same current_copy as the item that was on the holds shelf for a current ready for pickup hold.

Our production system has 414,000 cancelled holds with current_copy set. About 367,000 of these do NOT have a NULL capture_time.

Revision history for this message

John Amundson (jamundson) wrote on 2019-07-24:

Michele,

I think one of the reasons retaining the current_copy is important is the clear holds shelf function. The holds themselves may be cancelled, but staff still need to know which items are attached so that they can remove them from their holds shelf.

It is also beneficial for general troubleshooting. Cancelled holds are accessible from within a patron's record. It may be useful to see if and which item the hold had captured.

Evergreen Bug Maintenance (bugmaster) on 2019-08-22

Changed in evergreen:
milestone:	3.3.3 → 3.3.4

Revision history for this message

Michele Morgan (mmorgan) wrote on 2019-09-13:

John, good points about retaining the current_copy for cancelled holds for the purposes of clearing the holds shelf and troubleshooting. I can see the value there, but it seems like retaining current_copy in every hold makes it more cumbersome to get the pertinent data out of action.hold_request.

A quick query in our production database shows hundreds of holds with the same current_copy, fulfilled or cancelled long ago. I'm not sure we need to keep that data back to the beginning of holds.

All that said, I tested both patches with production data and am adding my signoff to the nulls last patch as it seems to perform just a bit better. Here's my signoff branch:

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/mmorgan/lp1827250_last_captured_hold_fix_nulls_last_signoff

tags:	added: signedoff
Changed in evergreen:
assignee:	Michele Morgan (mmorgan) → nobody

Evergreen Bug Maintenance (bugmaster) on 2019-09-19

Changed in evergreen:
milestone:	3.3.4 → 3.3.5

Revision history for this message

Galen Charlton (gmc) wrote on 2019-10-03:

#10

Pushed all the way back to rel_3_2. Thanks, Jason and Michele!

Changed in evergreen:
importance:	Undecided → Medium
status:	Confirmed → Fix Committed

Evergreen Bug Maintenance (bugmaster) on 2019-10-28

Changed in evergreen:
status:	Fix Committed → Fix Released

Revision history for this message

Josh Stompro (u-launchpad-stompro-org) wrote on 2020-02-11:

#11

Hello, I'm finding that the "NULLS LAST" query takes 65 seconds to return 372 rows, while the "AND RECHECK.capture_time IS NOT null" takes 1.3 seconds to return 372 rows.

We are running PG 9.4 yet, so that version is probably missing an optimization that makes the nulls first faster.

The explain analyze seems to be saying that the call number join is what is blowing it up, evaluating 250 million rows for some reason.

We will use the "AND RECHECK.capture_time IS NOT null" until we upgrade to a newer PG version.
Josh

Revision history for this message

Jason Stephenson (jstephenson) wrote on 2020-02-11:

#12

Josh, see bug 1855329.