Comment 13 for bug 1350831

Revision history for this message
Mike Rylander (mrylander) wrote :

Thanks for asking, Jeff.

Sure, the first one I thought might be an example, title browse search for "big bang", was indeed one. There are different titles under "Big bang" and "The big bang", unsurprisingly. Then I tried a title browse for "girl", and found "A Girl" and "The Girl" and "Girl", all different titles by different authors (and different formats). Similarly, I did a title browse for "stone" and got actually-different titles (and formats) of "stone" and "the stone".

That's just the easy stuff that I guessed might happen in a couple minutes on a test server. I can probably come up with some cross-language ones as well, which would be even more problematic for patrons, IMO.

Anyway, the point of all this is I believe we should fix the issue at the correct layer (which I see as configurably defining the meaning of "sameness" by normalizing "real value" strings in predetermined and predictable ways -- read: normalizers), and I strongly disagree that the offered patch is indeed changing things at the correct layer.

For the room generally: is it perhaps the issue that nobody wants to tackle creating (or verifying we have already) an appropriate ISBD-trimming normalizer? I really don't want to be re-implementing browse in a year or two because "it's broken and doesn't respect the data as cataloged".