marc_export with --items is too damned slow and other things
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Fix Released
|
Wishlist
|
Unassigned |
Bug Description
EG Version: master as of 20130721
OpenSRF Version: master as of 20130721
PostgreSQL Version: 9.1.9
The marc_export script that comes in the support_scripts directory is just too slow to export anything of any size. I have been requested to dump our full catalog with holdings information on a regular basis. I have my own server set up that can communicate with our production environment to do these sorts of things. You might consider it a utility server, but it isn't really.
Anyway, I started this command two days ago:
10074 pts/1 S+ 0:00 sh -c /openils/
It is still running and so far, has produced no output:
-rw-rw-r-- 1 jason jason 0 Sep 9 14:09 topsfield.mrc
I have written other export programs for Backstage, etc., that can export records in a matter of minutes to hours instead of days.
http://
I thought I'd use marc_export on this one, but decided that it needs a rewrite which is my intention with filing this bug.
Along the way, I intend to also address the following bugs:
https:/
https:/
https:/
https:/
In the cases of the two that have branches already, I will merge those branches into the code.
However, one thought that I have had is to just use the Perl DBI for this. My experience shows that extracting records like this is much faster when done through the DBI layer and not through JSON query calls in CStore. Such a switch might render the above branches obsolete.
Changed in evergreen: | |
milestone: | none → 2.next |
importance: | Undecided → Wishlist |
status: | New → Confirmed |
Changed in evergreen: | |
milestone: | 2.6.0-alpha1 → none |
Changed in evergreen: | |
milestone: | none → 2.6.0-beta1 |
Changed in evergreen: | |
status: | Fix Committed → Fix Released |
Much of what I said above is BS.
First, I figured out why marc_export produced no output: It was waiting on a list of bib ids on STDIN. However, when I used the --all option it ran for over 48 hours on my development vm before I stopped it, and it had only output about 1/4 of our bibs with --items specified. It seems the current program is indeed too slow.
Second, I am completely reimplementing MARC export in Evergreen, so while I will address the other listed bugs, I will not be merging anyone else's code or even referencing it.
I started working on something during the hackaway, and with some Fieldmapper modifications, I actually got it to work today. However, I'm unsatisfied with my present implementation and will start it over.
This time, I'll add a collection of Utility modules for FastExport. Looks like I'm going to put them under OpenILS::Utils. The new marc_export script will use these modules.
The reason for using modules is to make the code reusable in situations other than just a simple command line export script. For instance, I might replace some of the export code I've written in my other custom programs with these modules.
Also, the functionality could be more easily expanded with modules. For instance, modules could be added to compress output and/or upload the files to another server or directory somewhere. These are common tasks done after exporting MARC records. There is no reason that these cannot also be automated.