marc_export utility allows the creation of invalid (too large) MARC records
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Triaged
|
Undecided
|
Unassigned |
Bug Description
When doing a bibliographic record export with holdings, the marc_export utility allows the creation of records that are too large for the USMARC specification of 99,999 bytes. The MARC Perl libraries emit warnings at runtime that this is happening, but it goes ahead and creates the records anyway. A side effect of this is that the leader, which only contains 5 character positions for record length, also contains invalid data, since it enters the first five characters of the actual record length over 99,999. Example error output from MARC::Lint:
Invalid record length in record 299: Leader says 10732 bytes but it's actually 107321
As you can see, the actual length of 107321 is truncated to "10732". This causes any MARC processing utility to choke outright, and identifying the offending record is... difficult.
My suggestion is that the marc_export script be altered so that:
1) there is more useful debugging information available (the biblio.record.id of the currently processed record would suffice). Perhaps a "--debug" option could be added to the script?
additionally, or instead:
2) there is some mechanism for checking the size (length) of a record, and if it exceeds the MARC length limit, the script does not include it in the export file, but logs the record ID and any errors into an exceptions file.
Evergreen 2.5.1+
OpenSRF 2.2
PostgreSQL 9.3
Ubuntu 12.04/14.04
tags: | added: marc |
tags: |
added: cat-importexport cat-marc removed: marc |
I'm tempted to set this to "Won't fix," and add a comment along the lines of "MARC is a broken format. Don't use it."
However, I think this is more of an issue with MARC::Record and friends, since MARC::Record sets the size in the leader. I also think you should check what version of MARC::Record you have installed. I recall seeing code in a recent version that should handle oversize records by setting the size to 99999.
FWIW, I've only ever seen oversized records when exporting holdings, usually for the whole consortium. Most of our vendors have workarounds for this by ignoring the size field and reading to the next record separator. Really, any decent software should ignore the size field since it is wholly unnecessary.
You could try exporting records with holdings in separate batches for each member library. That should only take you and your vendor until Doomsday to output and to parse.
And, its practically 2015. Can we have a decent bibliographic record format already?