Acquisitions EDI Fetch Should Have Option to Delete Remote Files

Bug #2060699 reported by Jason Stephenson
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Confirmed
Wishlist
Unassigned

Bug Description

Evergreen Versions: 3.11, 3.12, main
OpenSRF Version: N/A
PostgreSQL version: N/A

When fetching EDI files for acquisitions, the remote files are left on the server for the vendor to clean up. Several vendors do not remove fetched files so they linger on the server for years. This means that the fetcher must handle an ever-increasing backlog of previously fetched files with each run. (C/W MARS has one account that currently reports over 1,700 files that get skipped in each fetcher run.) Given how the EDI accounts are set up there could be multiple accounts looking at the same files, thus further increasing the amount of work to be done (cf. bug 1836908).

The EDI fetch process should at least allow the option to delete files from the remote server when they are retrieved. This could work in a number of ways:

1. Delete all remote files determined to be duplicates by the fetcher.
2. Delete all remote files after they are picked up.
3. A combination of 1 & 2.

The above could be made the de facto behavior of the fetcher, or it could be controlled by one or more actor org_unit settings or command line switches to the edi_fetcher.pl program.

I'd like some feedback on these options before beginning work on the code.

Tags: acq acq-edi
Changed in evergreen:
status: New → Confirmed
Revision history for this message
Tiffany Little (tslittle) wrote :

This would be a great improvement. Right now I'm just manually going into the vendor FTP folders periodically and deleting things older than X weeks/months.

Personally I lean toward this being de facto behavior. Not only do I have a desire to limit more org unit settings, but I also just think this is logical behavior for the fetcher to do.

I'm leaning toward option #1, deleting files that are determined to be duplicates. IIRC, the fetcher is accessing the folder, reading the filename, and determining now that it's a duplicate so it skips. So maybe instead of skip, it could delete instead?

I also wonder if there's benefit to adding a time component. So checking if it's a duplicate, plus if it's older than X. Sometimes if someone says they're missing invoices or something, I'll access the FTP folders and see when the last files were placed there and if I can see the missing invoices, and then I know whether something is up in Evergreen where it didn't create the invoice, or just that the vendor hasn't put them on the server yet.

If that sounds like a good idea to anyone else, my vote would be for things older than 1 month. That's plenty of time, IMO, and it would still drastically cut down on the files that need to be skipped as dupes, which is the original intent of the bug, I think.

Revision history for this message
Galen Charlton (gmc) wrote :

I have reservations about deleting messages unconditionally, as a scenario where something goes wrong and a file is deleted prematurely due a bug could be awkward to recover from.

I think that the deletion behavior could be controlled by a flag on the EDI account record, rather than via a library setting. I agree with Tiffany that adding a time component would be useful.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.