MARC8 records with diacritics are exported with incorrect record length

Bug #1940702 reported by Jason Stephenson
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Evergreen
New
Undecided
Unassigned

Bug Description

Evergreen version: 3.5.3
O/S: Ubuntu Bionic (18.04)
Pg Version: 9.6

I suspect this bug is actually in MARC::Charset and/or MARC::Record. When one exports records with diacritics in MARC8 encoding using the Evergreen marc_export program, the leader has an incorrect record size. It seems to increased by 1 for each diacritic character.

I'm attaching two versions of the same record exported from the CW MARS database that illustrates this with a single diacritic. The marc8.mrc file contains the record exported in MARC8 and has a length of 1183 in the leader when the length is actually 1182. The UTF-8 file (utf8.mrc) has the correct length in the leader.

Tags: cat-marc
Revision history for this message
Jason Stephenson (jstephenson) wrote :
Revision history for this message
Jason Stephenson (jstephenson) wrote :

Here's the UTF-8 version of the record with the correct header length. I've seen this with other records. This is just one of the shortest that demonstrates the problem.

summary: - MRC8 records with diacritics are exported with incorrect record length
+ MARC8 records with diacritics are exported with incorrect record length
Elaine Hardy (ehardy)
tags: added: cat-marc
removed: marc
Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

Bug 1671845 describes other issues with the marc_export MARC8 format output.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

The issues are more likely to be in MARC::Record and/or the MARC8 Charset code. I wonder, too, if some of this is caused by bad records. I haven't looked at the test trecords I share here, but I have seen records with fields containing characters in different encodings, likely caused by copy and paste in other applications.

Revision history for this message
Linda Jansova (skolkova-s) wrote :

Perhaps comments included in a part of Koha code available at https://git.koha-community.org/Koha-community/Koha/src/branch/master/C4/Charset.pm (starting on line 618) might help shed some light on the issue, although they focus on exporting data to UTF-8, not to MARC-8.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.