Feature Request: Make Vandelay Asynchronous/Stateless

Bug #1514085 reported by Chris Sharp
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Wishlist
Unassigned

Bug Description

My idea is that we abandon the current streaming/interactive Vandelay (MARC Batch Import/Export) interface in favor of one that allows file import, then the file's status is reported to the user via a static HTML page, perhaps with a "refresh this page every X time interval", or even an email notification that lets the user know that the file load succeeded (or failed). This would solve several problems for the user:

1) if file upload is asynchronous, that alleviates the user's need to "babysit" the UI while the file loads.

2) asynchronous upload removes the expectation that bib loads should be "fast" (which they usually aren't in my experience)

3) even when the user "babysits" the UI, it can time out and the user doesn't have a clear indication that the UI has stopped working

There are probably more advantages that I'm not thinking of at the moment. I can't think of disadvantages right now.

This idea came to me while pondering approaches to bug 985295, but this would solve many frustrations I've had using vandelay to import large bib files in the past.

Affects all EG/OpenSRF versions on all DB and OS platforms.

Revision history for this message
Bill Erickson (berick) wrote :

I like the idea. I'd prefer it be handled via the API, instead of a static page, similar to clear holds shelf. For example:

* Client uploads file.

* Import API creates an entry in memcache to represent the import.

* Import API immediately returns the cache key to the client.

* Client jumps directly to the queue interface.

* As the import proceeds, the API updates the memcache'd object at regular intervals to store the state of the import (e.g. 135 of 200 records processed).

* Using the cache key, the queue interface regularly polls for status of the import and displays status information, similar to the existing progress bars/counters.

* Assuming the client stores the cache key, staff could navigate away from the interface and return later to check status.

We could go a step further and store import info in the database instead of the cache. With that, the data would persist and there would be a reportable history of imports. *shrug*.

===

+1 to email support as well. An action/trigger hook should suffice for that, regardless of how the above is handled.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

It would be nice if you could install a completion call back that could play a ding sound and cause the tab's title area to flash until the user activates the tab (provided the tab is still open), but that's probably too much to ask for a web application.

Changed in evergreen:
importance: Undecided → Wishlist
Revision history for this message
Mike Rylander (mrylander) wrote :

After we get rid of xulrunner (and, thus, can use websockets everywhere) that will be reasonable to do, actually.

Related, streaming responses will "just work" in a websockets world. That means continuous updating of processing state can be realized, as opposed to the state message throttling we have right now that eventually slows to a trickle (looking like a timeout) to avoid multipart/x-mixed-replace bugs in xulrunner.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

I thought it might, at least the callback/progress part.

All joking aside, I think many other parts of the client could benefit from asynchronous processing, such as bucket operations. If that's already there/planned for webclient, then forgive/ignore me. :) I've not looked into the webclient deeply enough at this point.

All in all I'm +1 on this feature suggestion, though I'm not at the point of being able to implement it myself.

Revision history for this message
Bill Erickson (berick) wrote :

Just to clarify few things...

All websockets code and thus all new browser client UI's use asynchronous network communication. And as Mike said, websockets solve the flakiness issues we have today with streaming translator / multipart messages. If Vandelay were ported to the new browser code, it would allow uploads to run to completion without appearing to stop halfway through.

However, using websockets does not mean you can start a process, close the browser tab, then return to the UI and check the status. That's a layer of a-synchronicity at a higher level than the network. I suppose the question is whether we would need this extra layer if the network layer behaved as it was supposed to.

Revision history for this message
Bill Erickson (berick) wrote :

Looking back at this, here's my current proposal:

1. Port the Vandelay UI to Angular (not AngularJS) -- I'll open a separate bug for this. This will be a post-3.2 thing. This gives us reliable real-time updates when staff do want to keep the tab open and watch it import.

2. Track import state in the database. I think this data could be useful for staff and it will be more reliable than memcache. This gives us the second layer of a-synchronicity, where staff can kick off an import, close the tab, then revisit later. It would be trivial for the UI to show active imports for the current workstation/user.

3. Add support for post-import email notifications.

Revision history for this message
Bill Erickson (berick) wrote :

Opened bug #1779158 for UI porting.

Revision history for this message
Bill Erickson (berick) wrote :
Bill Erickson (berick)
Changed in evergreen:
assignee: nobody → Bill Erickson (berick)
status: New → In Progress
milestone: none → 3.2-beta
Revision history for this message
Bill Erickson (berick) wrote :

I've done a bit more work on this. Tweaked the tracking table some and added support for creating session tracking rows via the API. The (Dojo) UI now tells the process spool and import APIs to exit early, then proceeds to polling the new tracker objects. There are a few more tie-ins I need to make, but the progress is promising.

My plan for the Dojo UI is only to improve progress reporting for imports. This alone should be a big benefit to the browser client, where pre-Websockets APIs cannot support streaming. At least now, staff can see the progress and it should be a lot more reliable.

Once migrated to Angular, I hope to add support for viewing active session trackers so staff can fire-and-forget (close the tab, etc.) then come back and check progress of active imports.

Revision history for this message
Bill Erickson (berick) wrote :

A secondary benefit of polling occurred to me. Since we're making regular API calls throughout the enqueue and import process, the auth token will no longer expire on long-running (short auth-expire) imports

I have now imported thousands of records without issue in the browser client Vandelay UI. The largest data set is the 5k sample record file. In all cases, progress is reported as expected in the UI.

This code does not include email notifications, will open separate LP for that.

==

Code is squashed and rebased to master. Since there's no new functionality in this branch, I have not included release notes, but I certainly can.

From the commit:

Adds a new DB table vandelay.session_tracker for monitoring progress on Vandelay enqueue and import sessions.

Enqueue and import APIs get a new option to exit early, returning the newly created tracker object, so the caller can monitor the tracker instead of listening to streamed responses, which are not supported in browser client Dojo interfaces.

Teach the existing Dojo Vandelay UI to exit early on enqueu & export and to poll for tracker data in lieu of waiting for streamed progress data.

On user merge / purge, trackers are migrated to the destination user.

==

Code pushed to:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/berick/lp1514085-vandelay-state-tracking

tags: added: pullrequest
Changed in evergreen:
status: In Progress → Confirmed
assignee: Bill Erickson (berick) → nobody
Revision history for this message
Bill Erickson (berick) wrote :

Opened bug #1780458 for email notices.

Revision history for this message
Jane Sandberg (sandbergja) wrote :

Thanks for all your work on this, Bill! I think it would be helpful to have a short release note, especially so that sysadmins know what the new vandelay.session_tracker table is all about.

tags: added: needsreleasenote
Bill Erickson (berick)
Changed in evergreen:
assignee: nobody → Bill Erickson (berick)
Revision history for this message
Bill Erickson (berick) wrote :

Thanks, Jane. Branch rebased to master, with additional release notes commit.

tags: removed: needsreleasenote
Kathy Lussier (klussier)
Changed in evergreen:
assignee: Bill Erickson (berick) → nobody
Revision history for this message
Kathy Lussier (klussier) wrote :

Hi Bill,

I tried a few Vandelay imports with this branch installed.

We typically configure Vandelay to immediately import the records, merging on no match or best match, rather than performing the import as a second step after the records have been queued. If I imported a smaller batch of records and exited the tab after the records have been queued and the import started, everything worked as expected. However, if I exited the tab while the system was still queuing the records, the system never proceeded to the next step of importing the records. I had to return to the queues interface to start the import. Is that expected?

I also encountered an issue whenever I tried importing a large batch of records (5000+). I tried four different batches of record, including MARC records downloaded from Project Gutenburg. In ever case, the enqueue process stopped at 1306. The logs indicated that there was a problem with a bad MARC record. It's possible that there are bad MARC records in each batch, but it seems strange that it always stopped at 1306.

Revision history for this message
Kathy Lussier (klussier) wrote :

Adding a link that shows the log output when the import stops at 1306. https://pastebin.com/MW4zK1Af

Revision history for this message
Bill Erickson (berick) wrote :

Hi Kathy, yes you have to keep the page the open until the queuing is complete and the import has started. That part has not changed.

Did you have the same issue with the "music_5k.mrc" file by any chance? If not, could you share one of your test files?

In the meantime, I'll rebase to master and do some more tests.

Revision history for this message
Bill Erickson (berick) wrote :

Also, I just had to raise the max body size setting in NGINX to allow the music_5k.mrc file to upload. It's possible this is affecting you as well. Changes may be required at the proxy or Apache level:

What worked for me in /etc/nginx/sites-enabled/osrf-ws-http-proxy

server {
   ...
   client_max_body_size 10m;
}

Revision history for this message
Bill Erickson (berick) wrote :

Just noting after rebasing to master, my music_5k enqueue + import completed successfully.

Revision history for this message
Kathy Lussier (klussier) wrote :

Thanks Bill. I don't know if I'm going to get back to this branch today. I will try, but if somebody else has the tuits, feel free to grab it!

Galen Charlton (gmc)
Changed in evergreen:
assignee: nobody → Galen Charlton (gmc)
Revision history for this message
Galen Charlton (gmc) wrote :

Pushed to master on the strength of my testing, Kathy's testing up to the apparent max client request issue, and Mike's eyeballing. Thanks, Bill, Kathy, and Mike!

Changed in evergreen:
status: Confirmed → Fix Committed
Kathy Lussier (klussier)
Changed in evergreen:
assignee: Galen Charlton (gmc) → nobody
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.