Record Match Set normalization of 020

Bug #1909611 reported by Jennifer Weston
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Evergreen
New
Undecided
Unassigned

Bug Description

When creating a Record Match Set, having an option to apply normalization rules to subfields that are used as match points would allow for identifying more matches and potentially importing fewer duplicate records because matches were missed.

This is true of all MARC Tags and subfields; however, looking at just the 020 would help tremendously. The 020$a is a major culprit behind many missed matches as vendor records (and older existing records) contain descriptive text rather than just the ISBN.

Example:
020\\$a9781534428812 (hardcover) will not match 020\\$a9781534428812

Having an option to choose to normalize the 020a to ignore text after the ISBN would allow for more automatic matches of scenarios like this one.

tags: added: angular cataloging vandelay
Revision history for this message
Rogan Hamby (rogan-hamby) wrote :

Seems like we have three possible avenues here:

1) Setup some special logic specifically for the 020 to normalize them. This seems like a no-brainer to me if we don't do #3.
2) Same as above but with a YAOUS to say yay or nay because we need even more org units.
3) Some infrastructure to save options in the db per library and per field per match set for regex.

Thoughts?

Revision history for this message
Clare Irwin (cirwin) wrote :

As a cataloguer, I struggle with this issue every time I import records to our system. I would love to be able to set up a match point record that ignores any information after the ISBN. Thank you.

Revision history for this message
Mike Rylander (mrylander) wrote :

I think adding a normalization configuration step available to match points defined by tag+subfield would be about the same amount of effort as creating special logic for one MARC tag. There are several different incarnations of that pattern for record attributes, search data, etc.

There are other options, such as adding multi-value support the record-attribute based match point logic and leveraging the normalization that can already be done through that mechanism. That (record-attribute based) match point logic currently assumes one value for a definition, but we know records in the wild do include multiple ISBNs.

We could also expand match points to fields extracted for search. We already normalize ISBN in the identifier search class.

Elaine Hardy (ehardy)
tags: added: cat-importexport
removed: cataloging vandelay
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.