marc_export -i gives incorrect record length in the leader when call numbers include UTF8 characters

Bug #1584891 reported by Jane Sandberg
2
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Undecided
Unassigned
2.10
Fix Released
Undecided
Unassigned
2.11
Fix Released
Undecided
Unassigned

Bug Description

Some of our call numbers include UTF8 characters. When I attempt to export those records using marc_export, I get a totally correct MARC file, with the correct record length in the LDR field. However, when I add the -i flag (to include items and volumes), it returns a MARC record with an invalid length in the LDR field. I have attached an example of an invalid record generated with the -i flag.

I am using EG 2.9.1.

Revision history for this message
Jane Sandberg (sandbergja) wrote :
Revision history for this message
Jane Sandberg (sandbergja) wrote :

Here is the original record as exported without the -i flag.

Revision history for this message
Mike Rylander (mrylander) wrote :

Jane,

The length value in bad_record.mrc is 02736, which matches the output of `wc -c bad_record.mrc`. The LoC documentation[1] specifies that lengths are counted in octets, not characters, so I don't think there's a bug here. Am I misunderstanding what you mean or is there some 3rd party tool that's having problems with the record?

[1] https://www.loc.gov/marc/specifications/specrecstruc.html#define ... specifically "length" and "logical record length".

Revision history for this message
Jane Sandberg (sandbergja) wrote :

Sorry, looks like I accidentally corrected bad_record.mrc! Here's the original, which reports 02734 as the length. Here is the real bad_record.mrc.

Revision history for this message
Jane Sandberg (sandbergja) wrote :

And the incorrect length is reported when I run the record through MarcEdit or PyMARC.

Revision history for this message
Dan Scott (denials) wrote :

I'm seeing this problem on 2.10.10 with copy locations that contain UTF8 characters. I think it's because when we generate the 852 fields in marc_export we need to utf8-encode the values of the individual subfields that we've received from the database.

Changed in evergreen:
status: New → Confirmed
Revision history for this message
Dan Scott (denials) wrote :
Changed in evergreen:
milestone: none → 2.12-beta
tags: added: pullrequest
Changed in evergreen:
milestone: 2.12-beta → 2.12-rc
Revision history for this message
Chris Sharp (chrissharp123) wrote :

I can confirm that it works using Jane's sample record with a unicode character added to the call number. Signoff here:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/csharp/lp1584891_marc_export_unicode_items

tags: added: signedoff
Revision history for this message
Dan Scott (denials) wrote :

Thanks Chris! I've pushed the fixes to the master, 2.11, and 2.10 branches.

Changed in evergreen:
status: Confirmed → Fix Committed
Revision history for this message
Jane Sandberg (sandbergja) wrote :

Thanks so much, Dan and Chris!

Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.