Duplicate 035 in recently added/updated MARC

Bug #1075989 reported by Jason Stephenson
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Evergreen
Triaged
Undecided
Unassigned

Bug Description

OpenSRF: 2.1.1/master
Evergreen: Master as of 20121106 && 20121028
O/S: Ubuntu Precise (12.04.1)
PostgreSQL: 9.1 && 9.2

Recently found the following two records in biblio.record_entry:

LDR 00641cam a2200205Ka 4500
001 1373387
003 MAnMC
005 20121001111358.0
008 030818s191u ie 000 1 gle d
035 . ‡a(OCoLC)52859713
035 . ‡a(OCoLC)52859713
040 . ‡aIEKBA ‡cIEKBA ‡dOCLCQ ‡dERN ‡dUtOrBLW
082 0 4. ‡a891.6213
100 1 . ‡aÓ Laoghaire, Peadar.
245 1 0. ‡aEisirt : ‡bLeagan Caighdeanaithe / ‡can t-Aṫair Peadar Ua Laoġaire.
260 . ‡aBaile Áṫa Cliaṫ : ‡bLongman, Brún agus Nuallán, ‡c191?
300 . ‡a88 p. ; ‡c17 cm.
994 . ‡aZ0 ‡bMRQ
948 . ‡hHELD BY MRQ - 3 OTHER HOLDINGS
901 . ‡a1373387 ‡bOCoLC ‡c1373387 ‡tbiblio

LDR 01165cam a2200289 a 4500
001 1373117
003 MAnMC
005 20121001111325.0
008 040913s2002 ie a j b 001 1 gle
015 . ‡aGBA467981 ‡2bnb
016 7 . ‡a012996559 ‡2Uk
020 . ‡a090609609X
020 . ‡a9780906096093
035 . ‡a(OCoLC)56651115
035 . ‡a(OCoLC)56651115
040 . ‡aUKM ‡cUKM ‡dOCLCQ ‡dUtOrBLW
245 0 0. ‡aRabhlaí rabhlaí : ‡brogha rannta traidisiúnta do aos óg / ‡cRoibeard Ó Cathasaigh a chuir in eagar i gcomhairle le baill Choiste Stiúrtha Thogra Béaloidis i mBunscoileanna Chorca Dhuibhne ; Deirdre Lyons Doyle a mhaisigh.
260 . ‡aLuimneach : ‡bAonad Forbartha Curaclaim ; ‡aBaile an Fheirtéaraigh : ‡bOidhreacht Chorca Dhuibhne, ‡c2002.
300 . ‡aviii, 62 p. : ‡bcol. ill. ; ‡c21 cm.
504 . ‡aIncludes bibliographical references and index.
650 0. ‡aNursery rhymes, Irish. ‡0(MVLC)478086.
650 0. ‡aChildren's songs, Irish. ‡0(MVLC)1334673.
700 1 . ‡aÓ Cathasaigh, Roibeard.
700 1 . ‡aDoyle, Deirdre Lyons.
994 . ‡aZ0 ‡bMRQ
948 . ‡hHELD BY MRQ - 2 OTHER HOLDINGS
901 . ‡a1373117 ‡bOCoLC ‡c1373117 ‡tbiblio

Note that both records have doubled 035 entries. I'm not sure what workflow leads to this, but there is definitely some way that the check for duplicate 035 in a single MARC record is not being triggered or is failing.

Tags: z3950 overlay
Revision history for this message
Dan Scott (denials) wrote :

Not "definitely" - there's no check to see if incoming records have duplicate 035s, so if the record already had duped 035s, they would be retained when imported. (I just confirmed this by importing a record and manually duplicating the 035).

So perhaps you want to check the incoming records, if you can track the actual source down.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Dan,

Checking the incoming record for the first one, it has 1 035.

Overlaying the record more than once seems to put the extra 035s in the record. I need to do some more experimentation, but this is what I did with the Eisirt record:

1. Delete the duplicate 035 in the local record.
2. Search the title in the local catalog and oclc via z39.50.
3. Find the OCLC record with the matching ocm number.
4. Overlay the local record with the oclc record.

Record in local database now has only 1 035.

Repeat the above steps 2 through 4.

Record in the local database now has duplicate 035 entries.

Repeat steps 2 through 4.

Record only has the two 035 entries. A third is not added.

As I say, it needs some more investigation, but something seems to be going on with z39.50 overlay and 035.

I'll keep digging.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Addendum:

Delete both 035 in the local record, then repeat steps 2 through 4 and I get two, duplicate 035 in the local record.

The record from OCLC has just 1 035.

Revision history for this message
Dan Scott (denials) wrote :

It would help a lot if you could attach the example source record from OCLC so that we can reproduce the problem easily and add one or more tests.

There's certainly enough tortured logic around the OCLC case that it could use some good tests.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

The MARC of the OCLC record as it appears in View MARC from the Z39.50 interface:

LDR 00547cam a2200181Ka 4500
001 ocm52859713
003 OCoLC
005 20121107131616.0
008 030818s191u ie 000 1 gle d
040 . ‡aIEKBA ‡cIEKBA ‡dOCLCQ ‡dERN
035 . ‡a(OCoLC)52859713
082 4. ‡a891.6213
100 1 . ‡aO Laoghaire, Peadar.
245 1 0. ‡aEisirt : ‡bLeagan Caighdeanaithe / ‡can t-Aṫair Peadar Ua Laoġaire.
260 . ‡aBaile Áṫa Cliaṫ : ‡bLongman, Brún agus Nuallán, ‡c191?
300 . ‡a88 p. ; ‡c17 cm.
994 . ‡aZ0 ‡bMRQ
948 . ‡hHELD BY MRQ - 3 OTHER HOLDINGS

Changed in evergreen:
status: New → Triaged
tags: added: overlay z39.50
Revision history for this message
Jason Stephenson (jstephenson) wrote :

Put simply, if you overlay an existing record via z39.50 that already has an 035 and the incoming record has an identical 035, your record will end up with two, identical 035 fields after overlay.

I have confirmed that the above still happens on my production system today. We are pretty much on 2.5.0 there. We upgraded from master on 20131208.

I am having fun (still) with searching the local catalog via z39.50 on our upgraded (2.6.0) test and development systems, so I have not been able to test this behavior there. I will open a new bug regarding local catalog search and z39.50 when I can find a coherent way to describe it.

Galen Charlton (gmc)
tags: added: z3950
removed: z39.50
Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

EG 3.3

I just recently noticed something slightly similar. I'm not sure if it is quite the same thing though.

We are seeing not quite duplicate, but functionally equivalent 035 oclc entries in a small number of records. One of the 035 entries always has a trailing space.

I'm wondering if the cat.maintain_control_numbers setting/feature is moving a 001 with a trailing space over to a 035 with a trailing space? Or not matching an existing 035a with a trailing space?

The pattern I was seeing was one singular 035a with a trailing space, along with another 035a with z subfields without a trailing space.

I wonder if it would make sense for the process that moves 001 -> 035 to exclude trailing spaces? Or for the matching process that looks for existing 035 entries to try and ignore trailing spaces?

I think this is the DB function that is doing the moving.
https://git.evergreen-ils.org/?p=Evergreen.git;a=blob;f=Open-ILS/src/sql/Pg/002.functions.config.sql;hb=448e2a4b0d6f7e3abfc291258e9c192bfd035c2c#l546

Josh

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.