call number browse (shelflist) disjointed for LC normalized call numbers

Bug #737819 reported by Dan Wells
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
High
Dan Wells

Bug Description

Testing of the new normalized call number browse revealed frequent failure with disjointed results. While the best possible fix is already outlined well in Bug #690829, the attached patch (against rel_2_0) attempts to address a majority of the typical cases in at least a "we are migrating this sucker tomorrow" kind of way :-)

The current failure results from the fact that selecting on the label but sorting on the label_key allows incorrect results into the 'after' set. For example, a browse of LC call number "Z7026 .K3" gives a middle row of:

Z6941 .M23 ------- Z8 .F8 H57 1989 ------- Z8 .G72 K56 2006

The actual call number browsed is not even on the page, and the "Z8 ..." numbers get into the set because their label is in fact 'greater' than "Z7026", but then get moved to the front of the selection because the label_sortkey sort value is much lower than "Z7026" (it is "Z0008 ...").

Though further testing (particular of non-LC data) would be wise, I think we should commit this patch (or something very similar) to rel_2_0 to address the immediate known issues while the more comprehensive solution is targeted at 2_1. This patch also assumes that the label_sortkey contains meaningful value, so any edge cases which result in a blank sortkey will need to be addressed separately.

Revision history for this message
Dan Wells (dbw2) wrote :
Dan Wells (dbw2)
description: updated
Revision history for this message
Dan Wells (dbw2) wrote :

I am concerned about possible regression on this issue:

http://svn.open-ils.org/trac/ILS/changeset/19560

but at this point I am unable to reproduce it with the data I have. Test cases, if known would be much appreciated.

Revision history for this message
Mike Rylander (mrylander) wrote :

r19560 was committed to address an issue with dewey-ish CN labels which, I suspect, will manifest with LCCNs as well. However, that was only half of the solution. See also http://svn.open-ils.org/trac/ILS/changeset/19768 for the other part.

The end result is correct sorting of callnumbers using the generic normalizer.

The primary change in the CN normalizer is that we normalize to spaces instead of underscores, which corrects the sort order for normalized labels when casted BYTEA.

Dan, could you set your LCCNs to use the generic class, which will trigger regeneration of the label_sortkey, and test that this indeed does not (at least) make things worse for your dataset?

Revision history for this message
Dan Wells (dbw2) wrote :

I went ahead and regenerated the sortkeys using 'generic' as suggested, and sorting looks fine for the identified case:

E174 H88 2005 EB ------ E174.5 .A45 1983 ------ E174.5 .B52

(center of search for 'E174.5')

To be clear, this is with the attached patch applied to 2.0.4, which effectively replaces r19560 but keeps r19768.

Revision history for this message
Mike Rylander (mrylander) wrote :

Dan, sorry for the delay in looking back at this.

The more I stare at this, the more I like the way you're identifying the pivot, so I'm in favor of committing this. I'm unsure if cn_startwith needs similar treatment -- given a complicated enough normalizer, I think it may.

Thanks Dan!

Revision history for this message
Dan Wells (dbw2) wrote :

Thank you, Mike, for the feedback. Committed with one small change (oils_text_as_bytea() on the initial lookup as well, which was an oversight).

Dan

Changed in evergreen:
assignee: nobody → Dan Wells (dbw2)
status: New → Fix Committed
milestone: none → 2.0.5
Ben Shum (bshum)
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.