Bug #1150939 “Dewey call number normalizer does not recognize 3 ...” : Bugs : Evergreen

Revision history for this message

Dan Scott (denials) wrote on 2013-03-07:

#1

What happens if you put the prefix for the call number into the call number's "Prefix" column, instead of prepending it directly to the call number label? I think the Dewey normalizer is expecting to normalize Dewey call numbers; having strings dumped in front of the Dewey call number makes it not be a Dewey call number.

Revision history for this message

Jason Etheridge (phasefx) wrote on 2013-03-07:

#2

collab/phasefx/lp1150939_dewey @ working/Evergreen.git
http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/phasefx/lp1150939_dewey

I tried tweaking the procedure some and ended up with this:

evergreen2=# select asset.label_normalizer_dewey2('357.61 MAG');
label_normalizer_dewey2
-------------------------
357_610000000000000_MAG
(1 row)

evergreen2=# select asset.label_normalizer_dewey2('357 MAG');
label_normalizer_dewey2
-------------------------
357_000000000000000_MAG
(1 row)

evergreen2=# select asset.label_normalizer_dewey2('E 357.61 MAG');
label_normalizer_dewey2
---------------------------
E_357_610000000000000_MAG
(1 row)

evergreen2=# select asset.label_normalizer_dewey2('E 357 MAG');
label_normalizer_dewey2
---------------------------
E_357_000000000000000_MAG
(1 row)

and for an odd-ball:

evergreen2=# select asset.label_normalizer_dewey2('YR DVD 978 B d');
label_normalizer_dewey2
--------------------------------
YR_DVD_978_000000000000000_B_D
(1 row)

compared to

evergreen2=# select asset.label_normalizer_dewey('YR DVD 978 B d');
label_normalizer_dewey
--------------------------------
YR_000000000000000_DVD_978_B_D
(1 row)

Revision history for this message

Jeremiah Miller (jeremym-t) wrote on 2013-03-07:

#3

>What happens if you put the prefix for the call number into the call number's "Prefix" column, instead of prepending it directly to the call number label?

Tried it... the prefix goes out of the equation entirely, and the sortkey is based solely on what is left. So the call number sorting intermingles everything together in one big soup, going strictly by the dewey numeric. Somewhat counter to the point of prefixes to begin with.

>I think the Dewey normalizer is expecting to normalize Dewey call numbers; having strings dumped in front of the Dewey call number makes it not be a Dewey call number.

Agreed, that is what it is doing. And that is true. But I've yet to see a public library stick to pure unadulterated Dewey. The idea of "prefixes" being a separate field is also a bit new, so existing collections are full of Dewey call numbers with strings dumped in front of them. Converting those collections to using separate prefixes appears non-trivial.

Most ILS seem to normalize and sort deweys just fine, whether prefixes are involved or not. They handle the non-ideal just as well as the ideal.

Revision history for this message

Jeremiah Miller (jeremym-t) wrote on 2013-03-07:

#4

>I tried tweaking the procedure some and ended up with this <snip>

Would like to try that out on our test server, see how it does with a bigger/real sample of callnums.

Revision history for this message

Dan Scott (denials) wrote on 2013-03-07:

#5

Why are we expecting the Dewey call number normalization algorithm to normalize call numbers that are not Dewey call numbers?

Why don't we just make use of the Prefix functionality that has been there for a number of releases?

Revision history for this message

Dan Scott (denials) wrote on 2013-03-07:

#6

You haven't provided any context for where this sorting is occurring (or not occurring, as the case may be). IIRC, there are efforts in the pull holds list to teach it to include prefixes in sorting.

If you want sorting by prefix + callnumber in some specific contexts, then we should fix that problem or problems... otherwise we have useless functionality in the form of the "Prefix" and "Suffix" and should remove it.

Revision history for this message

Jason Etheridge (phasefx) wrote on 2013-03-07: Re: [Bug 1150939] Re: Dewey call number normalizer does not recognize 3 digit dewey numbers that follow a prefix

#7

> Why don't we just make use of the Prefix functionality that has been
> there for a number of releases?

Dan, you mean sort on prefix and suffix in addition to label_sortkey?
I'd be down for that. I wonder if there's a reason why we're not
doing so already, or if there are libraries out there that don't want
that, such that we'd need to make it optional (yuck).

But I also think the normalizer should do the best it can with real
world data. We could just make a new one called Almost Dewey and have
folks choose that if desired. :-)

Could start validating labels against classification schemes too.

Revision history for this message

Kathy Lussier (klussier) wrote on 2013-03-07:

#8

I would support including prefix and suffix in the call number sort.

I personally don't see why a library wouldn't want the prefix to be part of the call number sort, so I don't think it needs to be optional, but I may be missing some kind of use case out there.

Revision history for this message

Jason Etheridge (phasefx) wrote on 2013-03-07:

#9

Incidentally, my branch is an actual bug fix to stored procedure as written. It's easy to see the logic flaw there. So regardless of everything else, I think I'm closer to that code's intention.

Revision history for this message

Ben Shum (bshum) wrote on 2013-03-07:

#10

While prefix/suffix are available features for many releases now, we still haven't been able to achieve any consensus on how to implement them for the libraries in our consortium in any meaningful way. In the interim, people have been continuing to create all sorts of interesting prefix/suffix call number entries directly part of the label field and ignoring prefix/suffix.

For a real life example though, I can verify that having a situation where shelf browse in TPAC would show things in order like:

YA 381 A
YA 382 A
YA 384 A
YA 381.x A
YA 382.x A
YA 384.x A

This was because of the 000's padding being after the first group instead of the number group as indicated. I tested phasefx's fix from the working branch and it worked to put the padding after the group of digits rather than just always after the first variable. Subsequent shelf browsing in the range after updating the volumes seemed to show proper sorting for the values again.

Revision history for this message

Ben Shum (bshum) wrote on 2013-03-07:

#11

Assigning targets for now, but I would consider this to be a bug and not just a question of prefix/suffix use (or not).

Changed in evergreen:
milestone:	none → 2.4.0-beta
status:	New → Confirmed

Revision history for this message

Jeremiah Miller (jeremym-t) wrote on 2013-03-08:

#12

>Why are we expecting the Dewey call number normalization algorithm to normalize call numbers that are not Dewey call numbers?

The preponderance of real world "Dewey call numbers" that are not strictly proper Dewey call numbers. Many coming from other ILS that do not have an equivalent to the prefix/suffix features. They will be encountered, handling them gracefully seems preferable to not doing so. Which it does already, with the exception of three digit numbers without decimals.

>Why don't we just make use of the Prefix functionality that has been there for a number of releases?

Agreed that would be the ideal. Agreed that would pretty much solve the issue. But moving toward making use of it is non-trivial for us, and likely others.

>You haven't provided any context for where this sorting is occurring (or not occurring, as the case may be).

My particular use case was straight database queries, selecting and sorting call number ranges using both callnum_label and callnum_label_sortkey. (Easily worked around at that level. And if we were using prefixes, would just add them to the SQL stew.) We've also experienced odd results when sorting report output by callnumber_label_sortkeys. (Another case where use of separate prefixes would solve the problem.)

But as far as I can tell, it is ubiquitous... anywhere and everywhere one can find (or create) a list of items and sort them by call numbers, dewey numbers with non-separate prefixes can sort incorrectly (depending on what mix of call numbers is in the list). Item status list view, pull holds list, copy buckets, OPAC call number search (shelf browse), report output, etc.

>If you want sorting by prefix + callnumber in some specific contexts, then we should fix that problem or problems

Agreed. While I can't think of a context where one wouldn't want that, I don't know of any where one can't.
I do know of a couple places the prefix isn't displayed... item status alternate view, copy editor.

>otherwise we have useless functionality in the form of the "Prefix" and "Suffix" and should remove it.

Disagree, I think the functionality appears quite useful and sensible, and even preferred. But taking advantage of it requires planning and work that can take time to execute properly. In the meantime, see no reason the Dewey normalizer should/could not be robust and capable of handling "bad Dewey" callnum input just as well as proper callnums. No reason it shouldn't return properly sortable results regardless of prefix use.

>Why are we expecting the Dewey call number normalization algorithm to normalize call numbers that are not Dewey call numbers?

The preponderance of real world "Dewey call numbers" that are not strictly proper Dewey call numbers.  Many coming from other ILS that do not have an equivalent to the prefix/suffix features.  They will be encountered, handling them gracefully seems preferable to not doing so.  Which it does already, with the exception of three digit numbers without decimals.

>Why don't we just make use of the Prefix functionality that has been there for a number of releases?

Agreed that would be the ideal.  Agreed that would pretty much solve the issue.  But moving toward making use of it is non-trivial for us, and likely others.

>You haven't provided any context for where this sorting is occurring (or not occurring, as the case may be).

My particular use case was straight database queries, selecting and sorting call number ranges using both callnum_label and callnum_label_sortkey.  (Easily worked around at that level.  And if we were using prefixes, would just add them to the SQL stew.)  We've also experienced odd results when sorting report output by callnumber_label_sortkeys.  (Another case where use of separate prefixes would solve the problem.)

But as far as I can tell, it is ubiquitous... anywhere and everywhere one can find (or create) a list of items and sort them by call numbers, dewey numbers with non-separate prefixes can sort incorrectly (depending on what mix of call numbers is in the list).  Item status list view, pull holds list, copy buckets, OPAC call number search (shelf browse), report output, etc.

>If you want sorting by prefix + callnumber in some specific contexts, then we should fix that problem or problems

Agreed. While I can't think of a context where one wouldn't want that, I don't know of any where one can't.
I do know of a couple places the prefix isn't displayed... item status alternate view, copy editor.

>otherwise we have useless functionality in the form of the "Prefix" and "Suffix" and should remove it.

Disagree, I think the functionality appears quite useful and sensible, and even preferred.  But taking advantage of it requires planning and work that can take time to execute properly.  In the meantime, see no reason the Dewey normalizer should/could not be robust and capable of handling "bad Dewey" callnum input just as well as proper callnums.  No reason it shouldn't return properly sortable results regardless of prefix use.

Revision history for this message

Dan Scott (denials) wrote on 2013-03-08:

#13

"But taking advantage of it requires planning and work that can take time to execute properly."

So does teaching a normalizer to accept garbage input. We're killing ourselves and making Evergreen a worse product by trying to bend over backwards to support every conceivable usage scenario and preference. Every time we do that, we open up more holes for bugs. And we've done this over and over for years, in our efforts to please some slim segment of users for each small option.

So I disagree with classifying this as a bug; I believe it's invalid. Garbage in, garbage out is a very basic principle. Let's say we make this "robust" for your use case. If someone else comes along and stuffs in a prefix of "YA 674.1" in front of call number "392.23", are we supposed to normalize the first 3-digit sequence or the second one? What if they complain about how their other ILS handles it the way they expected?

I would support looking into making prefixes affect sort order, although even there I have concerns. The original idea of call numbers is to co-locate like items, so that all of the books on taking care of your aquarium go together; in that case the prefix should be ignored.

If you want to limit search results to a particular collection (e.g. Young Adult), then we should make it easier to use a shelving location. Or perhaps sort results by clicking column headers (we've done work in other areas for this purpose). This is why I was asking for context: actual use cases should drive changes, not hand-waving around not having time to fix bad data or do things properly.

Changed in evergreen:
status:	Confirmed → Incomplete

Revision history for this message

Jason Etheridge (phasefx) wrote on 2013-03-08:

#14

If you guys would like, I can make a new normalizer and classification
scheme called "Garbage". Be easy enough to switch call numbers over
to it.

Revision history for this message

Jason Etheridge (phasefx) wrote on 2013-03-08:

#15

That said, I am a big fan of promoting shelving locations.

Revision history for this message

Jeremiah Miller (jeremym-t) wrote on 2013-03-08:

#16

Download full text (4.2 KiB)

>some slim segment of users for each small option.

More folks with prefixes in our Deweys, talking about similar issues:
http://list.evergreen-ils.org/pipermail/evergreen-catalogers/2012-May/000023.html

Our issue was exactly the same... migrated as generic, while new stuff went in as dewey due to settings. Settled on the same solution... flip everything to Dewey. But some of ours are only 3 digits numbers, which fail to sort properly.

>Garbage in, garbage out is a very basic principle.

Of course. We're simply differing on what constitutes garbage, and what doesn't. I think an otherwise valid Dewey number with a simple alpha prefix is common enough practice to be considered "not completely garbage".

>Let's say we make this "robust" for your use case. If someone else comes along and stuffs in a prefix of "YA 674.1" in front of call number "392.23", are we supposed to normalize the first 3-digit sequence or the second one? What if they complain about how their other ILS handles it the way they expected?

Unlike a simple alpha prefix, an entirely theoretical scenario. Not common intentional practice.

That said, the first sequence. Which is likely how the other ILS did it, if it allowed the entry in the first place. I have seen similar (and worse) done by mistake.

>I would support looking into making prefixes affect sort order, although even there I have concerns. The original idea of call numbers is to co-locate like items, so that all of the books on taking care of your aquarium go together; in that case the prefix should be ignored.

Many separate collections might have materials on aquarium care. None of which a given library might want to be co-located. In our case some of those are. Others aren't even on the same floor. I leave that debate for the librarians, I'm not qualified for it. But I can plainly see different institutions come to different conclusions on such matters.

>If you want to limit search results to a particular collection (e.g. Young Adult), then we should make it easier to use a shelving location.

Agreed... coincidentally "making better use of shelving locations" was the goal. Working on splitting a few large, monolithic locations into a reasonable set of shelving locations. How to do it? Grab the items needing assigned to a new location by call number ranges. Because the "location" is right there already, in the form of that prefix. Once that is done, then maybe we could talk about getting rid of the prefixes (or switching to the separate prefix feature).

>Or perhaps sort results by clicking column headers (we've done work in other areas for this purpose). This is why I was asking for context: actual use cases should drive changes, not hand-waving around not having time to fix bad data or do things properly.

Checked hands, stationary other than fingertips making typing motions. :)

Actual use cases: Anywhere call numbers are sorted, the described numbers can and do appear out of order. Multiple complaints from staff about it, in multiple scenarios. Figured out the cause. (Mixed Dewey/General classifications.) Fixed that, problem still there but with different pattern, different cause. Dewey...

>some slim segment of users for each small option.

More folks with prefixes in our Deweys, talking about similar issues:
http://list.evergreen-ils.org/pipermail/evergreen-catalogers/2012-May/000023.html

Our issue was exactly the same... migrated as generic, while new stuff went in as dewey due to settings.  Settled on the same solution... flip everything to Dewey.  But some of ours are only 3 digits numbers, which fail to sort properly.

>Garbage in, garbage out is a very basic principle.

Of course.  We're simply differing on what constitutes garbage, and what doesn't.  I think an otherwise valid Dewey number with a simple alpha prefix is common enough practice to be considered "not completely garbage".

>Let's say we make this "robust" for your use case. If someone else comes along and stuffs in a prefix of "YA 674.1" in front of call number "392.23", are we supposed to normalize the first 3-digit sequence or the second one? What if they complain about how their other ILS handles it the way they expected?

Unlike a simple alpha prefix, an entirely theoretical scenario.  Not common intentional practice.

That said, the first sequence.  Which is likely how the other ILS did it, if it allowed the entry in the first place.  I have seen similar (and worse) done by mistake.

>I would support looking into making prefixes affect sort order, although even there I have concerns. The original idea of call numbers is to co-locate like items, so that all of the books on taking care of your aquarium go together; in that case the prefix should be ignored.

Many separate collections might have materials on aquarium care.  None of which a given library might want to be co-located.  In our case some of those are.  Others aren't even on the same floor.  I leave that debate for the librarians, I'm not qualified for it.  But I can plainly see different institutions come to different conclusions on such matters.

>If you want to limit search results to a particular collection (e.g. Young Adult), then we should make it easier to use a shelving location.

Agreed... coincidentally "making better use of shelving locations" was the goal.  Working on splitting a few large, monolithic locations into a reasonable set of shelving locations.  How to do it?  Grab the items needing assigned to a new location by call number ranges.  Because the "location" is right there already, in the form of that prefix.  Once that is done, then maybe we could talk about getting rid of the prefixes (or switching to the separate prefix feature).

>Or perhaps sort results by clicking column headers (we've done work in other areas for this purpose). This is why I was asking for context: actual use cases should drive changes, not hand-waving around not having time to fix bad data or do things properly.

Checked hands, stationary other than fingertips making typing motions. :)

Actual use cases:  Anywhere call numbers are sorted, the described numbers can and do appear out of order.  Multiple complaints from staff about it, in multiple scenarios.  Figured out the cause.  (Mixed Dewey/General classifications.)  Fixed that, problem still there but with different pattern, different cause.  Dewey normalization does work fine, *even with prefixes*, if the number is at least four digits with a decimal.  It fails on a three digit no-decimal number.

If Dewey call number prefix = bad data, a lot of folks have bad data.  The three previous ILS's housing that data all sorted it consistently and as expected. None of them stored the prefix separately from the rest of the call number.

As for doing things properly, does "doing it properly" means that upon migration to Evergreen, one should use the call number prefix feature, stripping any embedded prefix from call numbers, and so on?  And that if one doesn't, they are "doing it wrong"?  What of those that migrated before that feature existed?

Amused that the conversation about classifying a problem can take longer than coming up with code to fix it. :)

Jason - I'd happily use a classification scheme called "garbage" if that is what it takes to get the job done.  I'm obviously a fan of shelving locations also.  It has taken over a decade to talk them into splitting these locations... but it's getting done all the same.

Revision history for this message

Dan Scott (denials) wrote on 2013-03-08:

#17

So there's already a "Garbage" normalizer. It's called "Generic", and it produces the following sort order for you initial data:

Looks an awful lot like what you asked for in the beginning; just set the label_class to 1 (e.g. UPDATE asset.call_number SET label_class = 1 WHERE label_class = 2 AND label !~ '^\d').

Alternately, it would be pretty straightforward migrate prefixes out of the labels by populating asset.call_number_prefix, then setting asset.call_number.prefix to point to the prefix and stripping the prefix out of the label.

Revision history for this message

Dan Scott (denials) wrote on 2013-03-08:

#18

And here's an example of how to migrate messed-up labels to use per-callnumber prefixes instead:

BEGIN;

-- Populate the call number prefixes
INSERT INTO asset.call_number_prefix (owning_lib, label)
(
  SELECT DISTINCT owning_lib, regexp_replace(label, '^([^\d]+)(\d{3}.*)$', trim('\1')) AS prefix
  FROM asset.call_number
  WHERE label_class = 2
    AND label ~ '^([^\d]+)\d{3}'
)
EXCEPT
(
  SELECT owning_lib, label
  FROM asset.call_number_prefix
);

-- Point at the actual prefixes and fix the labels
UPDATE asset.call_number
  SET prefix = acnp.id, label = regexp_replace(asset.call_number.label, '^([^\d]+)(\d{3}.*)$', trim('\2'))
  FROM asset.call_number_prefix acnp
  WHERE regexp_replace(asset.call_number.label, '^([^\d]+)(\d{3}.*)$', trim('\1')) = acnp.label
    AND label_class = 2
    AND asset.call_number.label ~ '^([^\d]+)\d{3}'
;

COMMIT;

That doesn't take care of the "DVD ABYSS" etc completely non-Dewey call number labels, but then you can address those with something like what I posted before. It would be easy to modify this approach to create & use shelving locations and their built-in prefix support, instead.

And note that when I said "co-locate", I did not mean physically, I meant by classification, so that you could browse all of the "Aquarium" books in the shelf browser -- whether they have a prefix of "NEW" or "Ref." or "YA" or "BIG PRINT" or whatever. See http://www.dlib.org/dlib/october04/dushay/10dushay.html for a classic article on the subject.

Revision history for this message

Jason Etheridge (phasefx) wrote on 2013-03-08:

#19

The generic one also seems to work better for something like DVD 3,
which I hope doesn't exist out in the wild.

Revision history for this message

Jeremiah Miller (jeremym-t) wrote on 2013-03-08:

#20

Download full text (4.2 KiB)

>So there's already a "Garbage" normalizer. It's called "Generic", and it produces the following sort order for you initial data:

Interesting. Yes, it appears that it does. That was actually my first inquiry before switching records to Dewey. Basically:

"The librarians tell me we are a Dewey library, yet all of our migrated callnumbers are set to Generic. While at the same time, the settings for new records are producing Dewey records. Is there a reason behind the old records being Generic? If so, is there a reason new records are not set the same way? If not, why aren't the old ones the same as the settings for the new ones?"

Appears the answer to that question should have been:

"Yes, they are Generic because you use prefixes with your Dewey numbers, and in Evergreen that is the proper classification for that type of numbering. Dewey is for pure Dewey, and will not sort your collection properly unless your prefixes are removed/migrated to a separate field. Change the new records to Generic, and the settings for new records to match. An alternative is to migrate your prefixes to a separate table, which is then linked to the record."

That wasn't the answer I got, so I headed down the wrong road. And judging from other conversations on the topic, we may not be the only ones using Dewey that might better be served by Generic.

Given that... is there something special to gain by using Dewey over Generic, that prompts folks to choose it instead? Or is it simple misunderstanding of what classification should be used for which circumstances?

> That doesn't take care of the "DVD ABYSS" etc completely non-Dewey call number labels, but then you can address those with something like what I posted before. It would be easy to modify this approach to create & use shelving locations and their built-in prefix support, instead.

Thank you for the code snippet, much appreciated. Will likely use when migrating/cleaning up prefixes comes to the fore of the plate. Though the "built-in prefix support" for shelving locations sounds very attractive, and doable once this separation is complete. Taking prefixes as found in callnums would surely give us garbage/typo based iterations of many of them. Would also take the opportunity to examine prefix usage, and perhaps eliminate many of them if possible. That being the part that actually takes the bulk of time & planning.

>And note that when I said "co-locate", I did not mean physically, I meant by classification, so that you could browse all of the "Aquarium" books in the shelf browser -- whether they have a prefix of "NEW" or "Ref." or "YA" or "BIG PRINT" or whatever.

Noted, and understood. I definitely see the benefits and attraction of that. I also can forsee the wailing and gnashing of teeth it might produce in my librarians so accustomed to the "other way"! Might take me another 10 years to get them to swallow that. Perhaps a user toggle in the OPAC would be a fantastic way to provide both. Except now I forsee programmers gnashing their teeth too. ;)

>See http://www.dlib.org/dlib/october04/dushay/10dushay.html for a classic article on the subject.

Excellent, thank you, will read.

N...

>So there's already a "Garbage" normalizer. It's called "Generic", and it produces the following sort order for you initial data:

Interesting.  Yes, it appears that it does.  That was actually my first inquiry before switching records to Dewey.  Basically:

"The librarians tell me we are a Dewey library, yet all of our migrated callnumbers are set to Generic.  While at the same time, the settings for new records are producing Dewey records.  Is there a reason behind the old records being Generic?  If so, is there a reason new records are not set the same way?  If not, why aren't the old ones the same as the settings for the new ones?"

Appears the answer to that question should have been:

"Yes, they are Generic because you use prefixes with your Dewey numbers, and in Evergreen that is the proper classification for that type of numbering.  Dewey is for pure Dewey, and will not sort your collection properly unless your prefixes are removed/migrated to a separate field.  Change the new records to Generic, and the settings for new records to match.  An alternative is to migrate your prefixes to a separate table, which is then linked to the record."

That wasn't the answer I got, so I headed down the wrong road.  And judging from other conversations on the topic, we may not be the only ones using Dewey that might better be served by Generic.

Given that... is there something special to gain by using Dewey over Generic, that prompts folks to choose it instead?  Or is it simple misunderstanding of what classification should be used for which circumstances?

> That doesn't take care of the "DVD ABYSS" etc completely non-Dewey call number labels, but then you can address those with something like what I posted before. It would be easy to modify this approach to create & use shelving locations and their built-in prefix support, instead.

Thank you for the code snippet, much appreciated.  Will likely use when migrating/cleaning up prefixes comes to the fore of the plate.  Though the "built-in prefix support" for shelving locations sounds very attractive, and doable once this separation is complete.  Taking prefixes as found in callnums would surely give us garbage/typo based iterations of many of them.  Would also take the opportunity to examine prefix usage, and perhaps eliminate many of them if possible.  That being the part that actually takes the bulk of time & planning.

>And note that when I said "co-locate", I did not mean physically, I meant by classification, so that you could browse all of the "Aquarium" books in the shelf browser -- whether they have a prefix of "NEW" or "Ref." or "YA" or "BIG PRINT" or whatever.

Noted, and understood.  I definitely see the benefits and attraction of that.  I also can forsee the wailing and gnashing of teeth it might produce in my librarians so accustomed to the "other way"!  Might take me another 10 years to get them to swallow that.  Perhaps a user toggle in the OPAC would be a fantastic way to provide both.  Except now I forsee programmers gnashing their teeth too. ;)

>See http://www.dlib.org/dlib/october04/dushay/10dushay.html for a classic article on the subject.

Excellent, thank you, will read.

Now, bug or not?

As reported, agreed.  Not a bug.  User training & data issue (utilizing wrong classification scheme for data).  I should also double check what documentation there is for that... don't remember getting or finding much detail on it, despite some looking and asking of questions.

However, Jason reports finding what appears to be a logic flaw in the code.  Has written a patch that corrects it.  A coincidental byproduct is that it ends up sorting the user's call numbers as desired, despite using wrong scheme.  Not sure how I'd classify that.  Also not qualified to evaluate efficiently, so bowing out of that discussion.  (Jason, still fine to use it on our test system if needing data to try it on.)

>The generic one also seems to work better for something like DVD 3, which I hope doesn't exist out in the wild.

I long ago passed the point at which I'd ever be suprised at what data I might find in the wild.  :)  And that looks very, very close to some call numbers for 30 Rock (DVD 30), and similar titles beginning with numerics (2001, 2010, 2012) *gasp*

Revision history for this message

Jason Etheridge (phasefx) wrote on 2013-03-09:

#21

> However, Jason reports finding what appears to be a logic flaw in the
> code. Has written a patch that corrects it. A coincidental byproduct
> is that it ends up sorting the user's call numbers as desired, despite
> using wrong scheme. Not sure how I'd classify that. Also not qualified
> to evaluate efficiently, so bowing out of that discussion. (Jason,
> still fine to use it on our test system if needing data to try it on.)

Thanks for the offer.

For what it's worth, I'm getting this fixed in Koha; I just need to
write a test for it:
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=9770

If the consensus is that the Dewey classification should be very
strict in what it allows (and there have only been a handful of folks
weighing in on this ticket, so who knows), then the real bug is that
we're not strict here. Should non-Dewey call numbers fall to the end
of the list if using the Dewey classification? Maybe have a fallback
to Generic as a sub-sort?

Revision history for this message

Dan Wells (dbw2) wrote on 2013-03-10:

#22

We don't use Dewey, so I have no personal stake in this. On the other hand, I have at times felt like I was part of a slim segment of users when pushing for a certain behavior of a feature, and in this particular case, I feel confident that this is a bug fix which will make Evergreen better.

As has been shown here, pure Dewey numbers sort fine without any normalization at all. That's part of the beauty of the system, I think. With that in mind, it makes sense that the main reason for a Dewey normalizer is to reasonably deal with the common but unofficial parts of Dewey call numbers.

Turning then to the code in question, it seems clear from the separation of alpha data and the author's code comments that the padding was meant for the first digit group, but that the code fails to check if the first group is alpha or digits. Jason's branch corrects this oversight, and I agree that this was the author's original intent.

That said, I also agree with Dan that a library would be better served by putting true prefixes in the prefix field. Doing so makes the data easier to sort, easier to analyze, and easier to maintain. Also, if any feature is generally not adopted, it won't get the development attention it deserves, and we are all forever stuck with "good enough" and never achieve the best possible.

Ben Shum (bshum) on 2013-03-17

Changed in evergreen:
milestone:	2.4.0-beta → 2.4.0-rc

Revision history for this message

Jeremiah Miller (jeremym-t) wrote on 2013-03-20:

#23

On the current status of "incomplete", meaning I (as the reporter) need to give more info...

Is there more info that I could provide? Or should the status change to something else?

Revision history for this message

Ben Shum (bshum) wrote on 2013-04-22:

#24

Discussed this bug briefly during the conference. Planning to reassess the contents and consider for merging a little later.

Changed in evergreen:
assignee:	nobody → Ben Shum (bshum)
status:	Incomplete → Confirmed

Ben Shum (bshum) on 2013-04-27

Changed in evergreen:
milestone:	2.4.0-rc → none

Revision history for this message

Ben Shum (bshum) wrote on 2013-08-20:

#25

Picked to master, rel_2_4, and rel_2_3 as bug fix for repairing the logic used in Dewey sort.

Revision history for this message

Ben Shum (bshum) wrote on 2013-08-20:

#26

Marking as "won't fix" for rel_2_2 given that we're outside that maintenance window.

Changed in evergreen:
milestone:	none → 2.5.0-alpha2
status:	Confirmed → Fix Committed
assignee:	Ben Shum (bshum) → nobody

Ben Shum (bshum) on 2013-11-11

Changed in evergreen:
status:	Fix Committed → Fix Released

	Status	Importance	Assigned to	Milestone
Evergreen	Fix Released	Undecided	Unassigned	Evergreen 2.5.0-alpha2
2.2	Won't Fix	Undecided	Unassigned
2.3	Fix Released	Undecided	Unassigned	Evergreen 2.3.10
2.4	Fix Released	Undecided	Unassigned	Evergreen 2.4.2

Evergreen

Dewey call number normalizer does not recognize 3 digit dewey numbers that follow a prefix

Bug Description

Other bug subscribers

Remote bug watches