Record Match Set normalization of 020
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
New
|
Undecided
|
Unassigned |
Bug Description
When creating a Record Match Set, having an option to apply normalization rules to subfields that are used as match points would allow for identifying more matches and potentially importing fewer duplicate records because matches were missed.
This is true of all MARC Tags and subfields; however, looking at just the 020 would help tremendously. The 020$a is a major culprit behind many missed matches as vendor records (and older existing records) contain descriptive text rather than just the ISBN.
Example:
020\\$a97815344
Having an option to choose to normalize the 020a to ignore text after the ISBN would allow for more automatic matches of scenarios like this one.
tags: | added: angular cataloging vandelay |
tags: |
added: cat-importexport removed: cataloging vandelay |
Seems like we have three possible avenues here:
1) Setup some special logic specifically for the 020 to normalize them. This seems like a no-brainer to me if we don't do #3.
2) Same as above but with a YAOUS to say yay or nay because we need even more org units.
3) Some infrastructure to save options in the db per library and per field per match set for regex.
Thoughts?