Label in call number browse needs to be normalized for maximum correctness
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Confirmed
|
Wishlist
|
Unassigned |
Bug Description
* Evergreen version: trunk and 2.0-beta5
* PostgreSQL version: 8.4
I moved this problem discussion over from https:/
Mike Rylander wrote, in a description of the use of asset.call_
"""
<snip>
The only way to make this infrastructure useful is to construct a query like so:
evergreen=# EXPLAIN SELECT "acn".create_date, "acn".creator, "acn".deleted, "acn".edit_date, "acn".editor, "acn".id, "acn".label, "acn".owning_lib, "acn".record, "acn".label_
-------
-------
Limit (cost=7.35..20.58 rows=9 width=92)
-> Index Scan using asset_call_
Filter: ((NOT deleted) AND ((regexp_
ext)), '\\\\'::text, '\\\\\\\\'::text, 'g'::text)
(3 rows)
But, there's a problem with that -- we have to know the normalizer type to use for the user-supplied call number value so that we know how we should convert it. One option would be to use the value configured for the search OU to normalize the input. We could offer the choice to users (or, at least, to staff) for correctness, but a choice needs to be made by someone, and a single normalizer applied.
We will also need to adjust the sortkey index ( "asset_
This may be a longer term project that we can handle before 2.0. Therefore, I suggest an alternate short-term solution: go back to label for sorting (though, with the as_bytea work in place) which is how 1.6 works, and (while not allowing different normalization forms) is know to work well enough for institutionally uniform configurations.
"""
To which Dan Scott responded:
"""
* The WHERE clause compares the raw label against the raw user-supplied call number - which means that a legitimate range of call numbers might be skipped for a given normalization. This is actually no different than how Evergreen works (including incorrect results) in previous versions. And it's important to underscore that the current use of acn.label in the WHERE clause is not the reason why the sequential scan occurs.
I disagree with the concluding assertion that "going back to label for sorting (though, with the as_bytea work in place) which is how 1.6 works, and (while not allowing different normalization forms) is know to work well enough for institutionally uniform configurations". It does not work well for libraries that use Library of Congress call numbers; it has been the source of many complaints in Conifer, and was the reason that I worked on the call number normalization in the first place. It was not simply an academic (ha ha) exercise; sorting on normalized call numbers is required to tackle actual user visible problems.
<snip>
For absolute correctness of call number searching and browsing (which probably should be a completely separate bug, but I'll address it here for now as a start and we can move to a separate bug if necessary), we need to know how to normalize the incoming call number. Let's consider the current user-visible entry points to call number browsing:
* Clicking on the "Shelf Browser" or "Browse Call Numbers" in the detailed item view, or clicking on the call number in the unapi htmlholdings-full format. From these points, we have access to the source acn, and therefore we have access to the source acn.label_class column, which can then be fed into the call number browsing method. I think it is a reasonable assumption that, when a person invokes the shelf browser, they expect to see other call numbers in the vicinity of this call number, and therefore we can use the source item's call number class.
So, if we give O:A:SuperCat:
* In the "Advanced Search", the call number search is currently a simplistic text field - one of the options in "Quick Search". In this case, as you suggest, we could break out the call number search into its own UI element and add user-selectable options for the normalization (defaulting to the normalization in the OU setting for the current search scope, of course).
The advanced search option currently opens cn_browse.xml, which pulls in cn_browse.js to invoke getCallnumber(), which currently just grabs the text string (the PARAM_CN (cn) GET param). We can add a PARAM_CNCLASS to this to provide the classification to the method call.
Once we get this far, we could actually teach the call number browsing methods to use label_sortkey in the WHERE clause and to normalize the incoming call number. I think we can write a two-argument and three-argument SQL function that takes the incoming call number text string, wraps it in oils_text_
"""