Evergreen

Record Match Set normalization of 020

Bug #1909611 reported by Jennifer Weston on 2020-12-29

This bug affects 6 people

Affects		Status	Importance	Assigned to	Milestone
	Evergreen	New	Undecided	Unassigned

Bug Description

When creating a Record Match Set, having an option to apply normalization rules to subfields that are used as match points would allow for identifying more matches and potentially importing fewer duplicate records because matches were missed.

This is true of all MARC Tags and subfields; however, looking at just the 020 would help tremendously. The 020$a is a major culprit behind many missed matches as vendor records (and older existing records) contain descriptive text rather than just the ISBN.

Example:
020\\$a9781534428812 (hardcover) will not match 020\\$a9781534428812

Having an option to choose to normalize the 020a to ignore text after the ISBN would allow for more automatic matches of scenarios like this one.

Tags:

Jennifer Weston (jweston) on 2020-12-29

tags:

added: angular cataloging vandelay

Revision history for this message

Rogan Hamby (rogan-hamby) wrote on 2020-12-30:

Seems like we have three possible avenues here:

1) Setup some special logic specifically for the 020 to normalize them. This seems like a no-brainer to me if we don't do #3.
2) Same as above but with a YAOUS to say yay or nay because we need even more org units.
3) Some infrastructure to save options in the db per library and per field per match set for regex.

Thoughts?

Revision history for this message

Clare Irwin (cirwin) wrote on 2021-01-12:

As a cataloguer, I struggle with this issue every time I import records to our system. I would love to be able to set up a match point record that ignores any information after the ISBN. Thank you.

Revision history for this message

Mike Rylander (mrylander) wrote on 2021-01-12:

I think adding a normalization configuration step available to match points defined by tag+subfield would be about the same amount of effort as creating special logic for one MARC tag. There are several different incarnations of that pattern for record attributes, search data, etc.

There are other options, such as adding multi-value support the record-attribute based match point logic and leveraging the normalization that can already be done through that mechanism. That (record-attribute based) match point logic currently assumes one value for a definition, but we know records in the wild do include multiple ISBNs.

We could also expand match points to fields extracted for search. We already normalize ISBN in the identifier search class.

Elaine Hardy (ehardy) on 2021-10-15

tags:

added: cat-importexport
removed: cataloging vandelay

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.