marc_export creating MARC data that yaz-marcdump dislikes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Confirmed
|
Medium
|
Unassigned |
Bug Description
Observed with:
Evergreen 2.11.3
Debian Jessie:
YAZ version: 4.2.30 98864b44c654645
Perl 5.20.2
Perl MARC modules installed from Debian Jessie packages:
libmarc-
libmarc-record-perl 2.0.6-1
libmarc-xml-perl 1.0.3-1
Tested with some healthy-appearing records in a migration system, have not attempted (yet) to reproduce with concerto bibs.
Per marc_export's --help output, using marc_export without passing --format or --encoding should default to USMARC encoded as MARC8:
--format or -f Output format (USMARC, UNIMARC, XML, BRE, ARE) [USMARC]
--encoding or -e Output encoding (UTF-8, ISO-8859-?, MARC8) [MARC8]
# export bib ids 123 and 456
echo -e "123\n456" | marc_export > test.mrc
I would expect "yaz-marcdump test.mrc" to be able to output the two records without issue, other than possible display-time encoding quirks due to my terminal not supporting MARC8.
Also tried with:
yaz-marcdump -n -p test.mrc
yaz-marcdump -n test.mrc
yaz-marcdump -f MARC8 -t UTF8 test.mrc
The following are some examples of warnings generated by the above yaz-marcdump commands:
Separator but not at end of field length=48
Bad indicator data. Skipping 1 bytes
Separator but not at end of field length=65
Separator but not at end of field length=29
Separator but not at end of field length=62
Bad indicator data. Skipping 2 bytes
No separator at end of field length=6
Bad indicator data. Skipping 2 bytes
The warnings / errors suggest a problem with the directory in the records, perhaps the values being incorrect when a multi-byte character is changed to a single-byte character during the encoding change from UTF-8 in the database to MARC8.
When specifying --encoding UTF-8, the resulting MARC output does not have the above errors. As a workaround, you should be able to output UTF-8 records from marc_export and then convert them to MARC8 with yaz-marcdump or other tools.
It is quite possible that this is not a bug in marc_export or even Evergreen, but an issue upstream. Looking for reported (and possibly fixed!) bugs there may be a good next step.
Changed in evergreen: | |
status: | New → Confirmed |
importance: | Undecided → Medium |
tags: | added: marc |
tags: |
added: cat-marc removed: marc |
A total of 17 error / warning messages are generated by yaz-marcdump when attempting to parse the two records in the above scenario, and the errors appear to begin AFTER the first occurrence of a character such as é or ©.
In some tests with a single record, I was unable to get yaz-marcdump to emit anything at all, other than an initial "<!-- Record 1 offset 0 (0x0) -->" when using -p