Web Client: Transits Don't Always Clear

Bug #1787274 reported by John Amundson
50
This bug affects 9 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Critical
Unassigned
3.0
Fix Released
Critical
Unassigned
3.1
Fix Released
Critical
Unassigned

Bug Description

Evergreen 3.0.8

Since we moved to the web client, we have had way more reports of items showing an active transit record not actually being in transit. We have not been able to determine why this is.
There's a couple things that could be contributing, like bug 1738688 and bug 1780315, but I don't see these contributing enough to effect the numbers we are seeing.

One working theory I have is that keeping check in modifiers on, (like Retarget Local Holds, Retarget All Statuses, and Clear Holds Shelf), is slowing down check-in and leading to transits not be calculated correctly. I have not been able to duplicate it. But in the reports that staff remember, at least one of these modifiers is usually turned on.

Below are a few comparison numbers based on transits created in the previous 2 months*.
The current numbers come from our production server, (we have been up on the web client since 5/28/18).
The old numbers come from a training server containing data from January-ish 2018 (*transits created Nov 2017 and newer). Because of this, the old numbers can be taken as more of an estimate, but since active transits will continue to stay active until something happens to the item, it should still be a good estimate.

Number of items that have multiple active transits:
Current: 131 | Old: 15 (773.33% increase)

Number of items that have an active transit but not a status of In Transit:
Current: 806 | Old: 170 (374.12% increase)

Number of items that have an active transit and specifically a status of Available or Recently Returned
Current: 484 | Old: 59 (720.34% increase)

I'll update this bug if future testing provides any insights.

Revision history for this message
John Amundson (jamundson) wrote :

I'll note that when referring to active transits, I'm referring to transits that have a NULL receive time and a NULL cancel time.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

In addition to what John reports, I have found many duplicate transits in our database, with about 8x as many occurring in the quarter since we upgraded to 3.0 and most libraries started using the web staff client. http://irc.evergreen-ils.org/evergreen/2018-09-07#i_376180

Jason Boyer also reported duplicate transits in IRC back in July:
http://irc.evergreen-ils.org/evergreen/2018-07-20

I believe that these duplicate transits are a cause of the "stuck" transits that John reports in the bug description. I believe some of these may be owing to multiple scans of the same item at checkin, but I won't rule out other possibilities, since the duplicate transits sometimes occur several minutes apart.

Here is a bunch of queries that I used in our database to more or less confirm that the situation has gotten worse with the web staff client:

https://pastebin.com/AqZeeeh2

It is my opinion that this bug is a blocker for adoption of the web staff client.

tags: added: webstaffblocker
Changed in evergreen:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Jason Stephenson (jstephenson) wrote :

Here's an example of 3 transits created with 28 thousandths of a second for the same copy to the same destination. It is my opinion that this is too fast to be a case of multiple barcode scans unless something is going on with the automatic carriage returns added by most scanners.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Because Bill Erickson asked, here's another with the dest_recv_time.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Here's a sample of the log messages for the copy from the previous spread sheets. These include only the messages where the copy's barcode appears. It looks like 3 separate checkin messages were sent to the websocket translator given the 3 ACT entries with different session numbers.

Revision history for this message
Terran McCanna (tmccanna) wrote :

I don't have any logs to share, but chiming in to say that we've also seen this problem and have not been able to pinpoint what is causing it.

Revision history for this message
Bill Erickson (berick) wrote :

Huzzah, I just recreated with the concerto data set. Had to paste/enter barcodes into the checkin screen really quickly.

I'll post a patch shortly that does 2 things:

1. Tracks in-flight barcodes for checkin so if a barcode is away talking to the API, quickly scanning the same barcode again will fail.

2. Add a unique constraint on action.transit_copy where no 2 transits for a single copy can have null receive and cancel times.

Changed in evergreen:
assignee: nobody → Bill Erickson (berick)
Revision history for this message
Bill Erickson (berick) wrote :

Patches:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/berick/lp1787274-checkin-dupes

1. Adds a check to the browser checkin UI to ensure a given copy may only have one in-flight checkin API call at a time.

2. Adds unique constraint triggers to action.transit_copy / hold_transit_copy / reservation_transit_copy. Because of the inheritance relationship between these tables, a simple unique index would not suffice.

3. Adds an index on target_copy where dest_recv_time and cancel_time are null, i.e. "open transit for copy". This may not be necessary, but since the check constraint makes this exact query, as well as parts of the application, it seemed like a win.

4. PGTAP tests.

tags: added: pullrequest
Changed in evergreen:
milestone: none → 3.2-rc
assignee: Bill Erickson (berick) → nobody
Revision history for this message
Bill Erickson (berick) wrote :

Note to testers, I found it very difficult to create a duplicate transit in practice. I had to enter a barcode in the checkin UI, hit Enter, the immediately control-v paste the same barcode into the input and tap Enter again. I could get it to work about 1 in 10 tries.

For the purposes of testing the checkin UI changes, if nothing breaks in the checkin UI, I consider it a successful test. If you are able to reproduce the problem, an error message will appear in the console log indicating the copy is already in-flight.

Revision history for this message
Kathy Lussier (klussier) wrote :

Thanks Bill! I've recently been seeing a delay with my checkins that trigger a holds or transit popup, so I was able to easily replicate this issue. With this code, I verified that multiple transits were not created when quickly checking in an item twice and confirmed that I was now seeing the in-flight message in the browser console. I also verified that the pgtap test passes.

Picked to 3.2, 3.1 and 3.0.

Changed in evergreen:
status: Confirmed → Fix Committed
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.