Dewey call number normalizer does not recognize 3 digit dewey numbers that follow a prefix

Bug #1150939 reported by Jeremiah Miller
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Undecided
Unassigned
2.2
Won't Fix
Undecided
Unassigned
2.3
Fix Released
Undecided
Unassigned
2.4
Fix Released
Undecided
Unassigned

Bug Description

Version 2.3

With these call numbers:
357.61 MAG
357 MAG
E 357.61 MAG
E 357 MAG

The dewey call number normalizer gives the following keys:
357_000000000000000_MAG
357_610000000000000_MAG
E_357_610000000000000_MAG
E_000000000000000_357_MAG

Note the treatment of the 3 digit whole number changing drastically when it follows a prefix.

The result of this behavior, combined with the result of another dewey normalizer bug (#1131895), can produce a sort order like this:

call number sortkey
DVD 2012 | DVD_000000000000000_2012
DVD 211 RELIG | DVD_000000000000000_211_RELIG
DVD 419 AMERI | DVD_000000000000000_419_AMERI
DVD 570 LIFE | DVD_000000000000000_570_LIFE
DVD 635 REBEC | DVD_000000000000000_635_REBEC
DVD 775 POINT | DVD_000000000000000_775_POINT
DVD 956 BUDRU | DVD_000000000000000_956_BUDRU
DVD BREAK 1 | DVD_000000000000000_BREAK_1
DVD BREAK 2 | DVD_000000000000000_BREAK_2
DVD GLEE 1 | DVD_000000000000000_GLEE_1
DVD OFFIC 1 | DVD_000000000000000_OFFIC_1
DVD OFFIC 2 | DVD_000000000000000_OFFIC_2
DVD 001.942 LIES | DVD_001_942000000000000_LIES
DVD 229.913 WALKI | DVD_229_913000000000000_WALKI
DVD 305.235 BILLY | DVD_305_235000000000000_BILLY
DVD 30 2 | DVD_30_200000000000000
DVD 30 3 | DVD_30_300000000000000
DVD 355.8 RADIO | DVD_355_800000000000000_RADIO
DVD 419.7 AMERI | DVD_419_700000000000000_AMERI
DVD 612.63 LIFES | DVD_612_630000000000000_LIFES
DVD 613.7192 PILAT | DVD_613_719200000000000_PILAT
DVD 704.086 CATS | DVD_704_086000000000000_CATS
DVD 808.51 LEARN | DVD_808_510000000000000_LEARN
DVD 973.931 SEVEN | DVD_973_931000000000000_SEVEN
DVD ABYSS | DVD_ABYSS
DVD BATMA | DVD_BATMA
DVD EXORC | DVD_EXORC
DVD FARGO | DVD_FARGO
DVD GODFA | DVD_GODFA
DVD IRON | DVD_IRON
DVD NERO 2 PT.3 | DVD_NERO_2_PT_300000000000000
DVD NERO 2 PT.4 | DVD_NERO_2_PT_400000000000000
DVD PROME | DVD_PROME

Whereas we would expect to see something more like this:

DVD 001.942 LIES | DVD_001_942000000000000_LIES
DVD 211 RELIG | DVD_211_000000000000000_RELIG
DVD 229.913 WALKI | DVD_229_913000000000000_WALKI
DVD 305.235 BILLY | DVD_305_235000000000000_BILLY
DVD 355.8 RADIO | DVD_355_800000000000000_RADIO
DVD 419 AMERI | DVD_419_000000000000000_AMERI
DVD 419.7 AMERI | DVD_419_700000000000000_AMERI
DVD 570 LIFE | DVD_570_000000000000000_LIFE
DVD 612.63 LIFES | DVD_612_630000000000000_LIFES
DVD 613.7192 PILAT | DVD_613_719200000000000_PILAT
DVD 635 REBEC | DVD_635_000000000000000_REBEC
DVD 704.086 CATS | DVD_704_086000000000000_CATS
DVD 775 POINT | DVD_775_000000000000000_POINT
DVD 808.51 LEARN | DVD_808_510000000000000_LEARN
DVD 956 BUDRU | DVD_956_000000000000000_BUDRU
DVD 973.931 SEVEN | DVD_973_931000000000000_SEVEN
DVD ABYSS | DVD_ABYSS
DVD BATMA | DVD_BATMA
DVD BREAK 1 | DVD_BREAK_1
DVD BREAK 2 | DVD_BREAK_2
DVD EXORC | DVD_EXORC
DVD FARGO | DVD_FARGO
DVD GLEE 1 | DVD_GLEE_1
DVD GODFA | DVD_GODFA
DVD IRON | DVD_IRON
DVD NERO 2 PT.3 | DVD_NERO_2_PT_3
DVD NERO 2 PT.4 | DVD_NERO_2_PT_4
DVD OFFIC 1 | DVD_OFFIC_1
DVD OFFIC 2 | DVD_OFFIC_2
DVD PROME | DVD_PROME

Pulled these out separately, as I'm not really sure what we could expect from tricky ones like these (movie title 2012, television title 30 rock seasons 2 & 3)
DVD 2012
DVD 30 2
DVD 30 3

Tags: sorting
Revision history for this message
Dan Scott (denials) wrote :

What happens if you put the prefix for the call number into the call number's "Prefix" column, instead of prepending it directly to the call number label? I think the Dewey normalizer is expecting to normalize Dewey call numbers; having strings dumped in front of the Dewey call number makes it not be a Dewey call number.

Revision history for this message
Jason Etheridge (phasefx) wrote :

collab/phasefx/lp1150939_dewey @ working/Evergreen.git
http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/phasefx/lp1150939_dewey

I tried tweaking the procedure some and ended up with this:

evergreen2=# select asset.label_normalizer_dewey2('357.61 MAG');
 label_normalizer_dewey2
-------------------------
 357_610000000000000_MAG
(1 row)

evergreen2=# select asset.label_normalizer_dewey2('357 MAG');
 label_normalizer_dewey2
-------------------------
 357_000000000000000_MAG
(1 row)

evergreen2=# select asset.label_normalizer_dewey2('E 357.61 MAG');
  label_normalizer_dewey2
---------------------------
 E_357_610000000000000_MAG
(1 row)

evergreen2=# select asset.label_normalizer_dewey2('E 357 MAG');
  label_normalizer_dewey2
---------------------------
 E_357_000000000000000_MAG
(1 row)

and for an odd-ball:

evergreen2=# select asset.label_normalizer_dewey2('YR DVD 978 B d');
    label_normalizer_dewey2
--------------------------------
 YR_DVD_978_000000000000000_B_D
(1 row)

compared to

evergreen2=# select asset.label_normalizer_dewey('YR DVD 978 B d');
     label_normalizer_dewey
--------------------------------
 YR_000000000000000_DVD_978_B_D
(1 row)

Revision history for this message
Jeremiah Miller (jeremym-t) wrote :

>What happens if you put the prefix for the call number into the call number's "Prefix" column, instead of prepending it directly to the call number label?

Tried it... the prefix goes out of the equation entirely, and the sortkey is based solely on what is left. So the call number sorting intermingles everything together in one big soup, going strictly by the dewey numeric. Somewhat counter to the point of prefixes to begin with.

>I think the Dewey normalizer is expecting to normalize Dewey call numbers; having strings dumped in front of the Dewey call number makes it not be a Dewey call number.

Agreed, that is what it is doing. And that is true. But I've yet to see a public library stick to pure unadulterated Dewey. The idea of "prefixes" being a separate field is also a bit new, so existing collections are full of Dewey call numbers with strings dumped in front of them. Converting those collections to using separate prefixes appears non-trivial.

Most ILS seem to normalize and sort deweys just fine, whether prefixes are involved or not. They handle the non-ideal just as well as the ideal.

Revision history for this message
Jeremiah Miller (jeremym-t) wrote :

>I tried tweaking the procedure some and ended up with this <snip>

Would like to try that out on our test server, see how it does with a bigger/real sample of callnums.

Revision history for this message
Dan Scott (denials) wrote :

Why are we expecting the Dewey call number normalization algorithm to normalize call numbers that are not Dewey call numbers?

Why don't we just make use of the Prefix functionality that has been there for a number of releases?

Revision history for this message
Dan Scott (denials) wrote :

You haven't provided any context for where this sorting is occurring (or not occurring, as the case may be). IIRC, there are efforts in the pull holds list to teach it to include prefixes in sorting.

If you want sorting by prefix + callnumber in some specific contexts, then we should fix that problem or problems... otherwise we have useless functionality in the form of the "Prefix" and "Suffix" and should remove it.

Revision history for this message
Jason Etheridge (phasefx) wrote : Re: [Bug 1150939] Re: Dewey call number normalizer does not recognize 3 digit dewey numbers that follow a prefix

> Why don't we just make use of the Prefix functionality that has been
> there for a number of releases?

Dan, you mean sort on prefix and suffix in addition to label_sortkey?
I'd be down for that. I wonder if there's a reason why we're not
doing so already, or if there are libraries out there that don't want
that, such that we'd need to make it optional (yuck).

But I also think the normalizer should do the best it can with real
world data. We could just make a new one called Almost Dewey and have
folks choose that if desired. :-)

Could start validating labels against classification schemes too.

Revision history for this message
Kathy Lussier (klussier) wrote :

I would support including prefix and suffix in the call number sort.

I personally don't see why a library wouldn't want the prefix to be part of the call number sort, so I don't think it needs to be optional, but I may be missing some kind of use case out there.

Revision history for this message
Jason Etheridge (phasefx) wrote :

Incidentally, my branch is an actual bug fix to stored procedure as written. It's easy to see the logic flaw there. So regardless of everything else, I think I'm closer to that code's intention.

Revision history for this message
Ben Shum (bshum) wrote :

While prefix/suffix are available features for many releases now, we still haven't been able to achieve any consensus on how to implement them for the libraries in our consortium in any meaningful way. In the interim, people have been continuing to create all sorts of interesting prefix/suffix call number entries directly part of the label field and ignoring prefix/suffix.

For a real life example though, I can verify that having a situation where shelf browse in TPAC would show things in order like:

YA 381 A
YA 382 A
YA 384 A
YA 381.x A
YA 382.x A
YA 384.x A

This was because of the 000's padding being after the first group instead of the number group as indicated. I tested phasefx's fix from the working branch and it worked to put the padding after the group of digits rather than just always after the first variable. Subsequent shelf browsing in the range after updating the volumes seemed to show proper sorting for the values again.

Revision history for this message
Ben Shum (bshum) wrote :

Assigning targets for now, but I would consider this to be a bug and not just a question of prefix/suffix use (or not).

Changed in evergreen:
milestone: none → 2.4.0-beta
status: New → Confirmed
Revision history for this message
Jeremiah Miller (jeremym-t) wrote :

>Why are we expecting the Dewey call number normalization algorithm to normalize call numbers that are not Dewey call numbers?

The preponderance of real world "Dewey call numbers" that are not strictly proper Dewey call numbers. Many coming from other ILS that do not have an equivalent to the prefix/suffix features. They will be encountered, handling them gracefully seems preferable to not doing so. Which it does already, with the exception of three digit numbers without decimals.

>Why don't we just make use of the Prefix functionality that has been there for a number of releases?

Agreed that would be the ideal. Agreed that would pretty much solve the issue. But moving toward making use of it is non-trivial for us, and likely others.

>You haven't provided any context for where this sorting is occurring (or not occurring, as the case may be).

My particular use case was straight database queries, selecting and sorting call number ranges using both callnum_label and callnum_label_sortkey. (Easily worked around at that level. And if we were using prefixes, would just add them to the SQL stew.) We've also experienced odd results when sorting report output by callnumber_label_sortkeys. (Another case where use of separate prefixes would solve the problem.)

But as far as I can tell, it is ubiquitous... anywhere and everywhere one can find (or create) a list of items and sort them by call numbers, dewey numbers with non-separate prefixes can sort incorrectly (depending on what mix of call numbers is in the list). Item status list view, pull holds list, copy buckets, OPAC call number search (shelf browse), report output, etc.

>If you want sorting by prefix + callnumber in some specific contexts, then we should fix that problem or problems

Agreed. While I can't think of a context where one wouldn't want that, I don't know of any where one can't.
I do know of a couple places the prefix isn't displayed... item status alternate view, copy editor.

>otherwise we have useless functionality in the form of the "Prefix" and "Suffix" and should remove it.

Disagree, I think the functionality appears quite useful and sensible, and even preferred. But taking advantage of it requires planning and work that can take time to execute properly. In the meantime, see no reason the Dewey normalizer should/could not be robust and capable of handling "bad Dewey" callnum input just as well as proper callnums. No reason it shouldn't return properly sortable results regardless of prefix use.

Revision history for this message
Dan Scott (denials) wrote :

"But taking advantage of it requires planning and work that can take time to execute properly."

So does teaching a normalizer to accept garbage input. We're killing ourselves and making Evergreen a worse product by trying to bend over backwards to support every conceivable usage scenario and preference. Every time we do that, we open up more holes for bugs. And we've done this over and over for years, in our efforts to please some slim segment of users for each small option.

So I disagree with classifying this as a bug; I believe it's invalid. Garbage in, garbage out is a very basic principle. Let's say we make this "robust" for your use case. If someone else comes along and stuffs in a prefix of "YA 674.1" in front of call number "392.23", are we supposed to normalize the first 3-digit sequence or the second one? What if they complain about how their other ILS handles it the way they expected?

I would support looking into making prefixes affect sort order, although even there I have concerns. The original idea of call numbers is to co-locate like items, so that all of the books on taking care of your aquarium go together; in that case the prefix should be ignored.

If you want to limit search results to a particular collection (e.g. Young Adult), then we should make it easier to use a shelving location. Or perhaps sort results by clicking column headers (we've done work in other areas for this purpose). This is why I was asking for context: actual use cases should drive changes, not hand-waving around not having time to fix bad data or do things properly.

Changed in evergreen:
status: Confirmed → Incomplete
Revision history for this message
Jason Etheridge (phasefx) wrote :

If you guys would like, I can make a new normalizer and classification
scheme called "Garbage". Be easy enough to switch call numbers over
to it.

Revision history for this message
Jason Etheridge (phasefx) wrote :

That said, I am a big fan of promoting shelving locations.

Revision history for this message
Jeremiah Miller (jeremym-t) wrote :
Download full text (4.2 KiB)

>some slim segment of users for each small option.

More folks with prefixes in our Deweys, talking about similar issues:
http://list.evergreen-ils.org/pipermail/evergreen-catalogers/2012-May/000023.html

Our issue was exactly the same... migrated as generic, while new stuff went in as dewey due to settings. Settled on the same solution... flip everything to Dewey. But some of ours are only 3 digits numbers, which fail to sort properly.

>Garbage in, garbage out is a very basic principle.

Of course. We're simply differing on what constitutes garbage, and what doesn't. I think an otherwise valid Dewey number with a simple alpha prefix is common enough practice to be considered "not completely garbage".

>Let's say we make this "robust" for your use case. If someone else comes along and stuffs in a prefix of "YA 674.1" in front of call number "392.23", are we supposed to normalize the first 3-digit sequence or the second one? What if they complain about how their other ILS handles it the way they expected?

Unlike a simple alpha prefix, an entirely theoretical scenario. Not common intentional practice.

That said, the first sequence. Which is likely how the other ILS did it, if it allowed the entry in the first place. I have seen similar (and worse) done by mistake.

>I would support looking into making prefixes affect sort order, although even there I have concerns. The original idea of call numbers is to co-locate like items, so that all of the books on taking care of your aquarium go together; in that case the prefix should be ignored.

Many separate collections might have materials on aquarium care. None of which a given library might want to be co-located. In our case some of those are. Others aren't even on the same floor. I leave that debate for the librarians, I'm not qualified for it. But I can plainly see different institutions come to different conclusions on such matters.

>If you want to limit search results to a particular collection (e.g. Young Adult), then we should make it easier to use a shelving location.

Agreed... coincidentally "making better use of shelving locations" was the goal. Working on splitting a few large, monolithic locations into a reasonable set of shelving locations. How to do it? Grab the items needing assigned to a new location by call number ranges. Because the "location" is right there already, in the form of that prefix. Once that is done, then maybe we could talk about getting rid of the prefixes (or switching to the separate prefix feature).

>Or perhaps sort results by clicking column headers (we've done work in other areas for this purpose). This is why I was asking for context: actual use cases should drive changes, not hand-waving around not having time to fix bad data or do things properly.

Checked hands, stationary other than fingertips making typing motions. :)

Actual use cases: Anywhere call numbers are sorted, the described numbers can and do appear out of order. Multiple complaints from staff about it, in multiple scenarios. Figured out the cause. (Mixed Dewey/General classifications.) Fixed that, problem still there but with different pattern, different cause. Dewey...

Read more...

Revision history for this message
Dan Scott (denials) wrote :

So there's already a "Garbage" normalizer. It's called "Generic", and it produces the following sort order for you initial data:

SELECT label, label_sortkey FROM asset.call_number WHERE label LIKE 'DVD%' ORDER BY label_sortkey ASC;
       label | label_sortkey
--------------------+--------------------
 DVD 001.942 LIES | DVD 001.942 LIES
 DVD 211 RELIG | DVD 211 RELIG
 DVD 229.913 WALKI | DVD 229.913 WALKI
 DVD 30 2 | DVD 30 2
 DVD 30 3 | DVD 30 3
 DVD 305.235 BILLY | DVD 305.235 BILLY
 DVD 355.8 RADIO | DVD 355.8 RADIO
 DVD 419 AMERI | DVD 419 AMERI
 DVD 419.7 AMERI | DVD 419.7 AMERI
 DVD 570 LIFE | DVD 570 LIFE
 DVD 612.63 LIFES | DVD 612.63 LIFES
 DVD 613.7192 PILAT | DVD 613.7192 PILAT
 DVD 635 REBEC | DVD 635 REBEC
 DVD 704.086 CATS | DVD 704.086 CATS
 DVD 775 POINT | DVD 775 POINT
 DVD 808.51 LEARN | DVD 808.51 LEARN
 DVD 956 BUDRU | DVD 956 BUDRU
 DVD 973.931 SEVEN | DVD 973.931 SEVEN
 DVD ABYSS | DVD ABYSS
 DVD BATMA | DVD BATMA
 DVD BREAK 1 | DVD BREAK 1
 DVD BREAK 2 | DVD BREAK 2
 DVD EXORC | DVD EXORC
 DVD FARGO | DVD FARGO
 DVD GLEE 1 | DVD GLEE 1
 DVD GODFA | DVD GODFA
 DVD IRON | DVD IRON
 DVD NERO 2 PT.3 | DVD NERO 2 PT.3
 DVD NERO 2 PT.4 | DVD NERO 2 PT.4
 DVD OFFIC 1 | DVD OFFIC 1
 DVD OFFIC 2 | DVD OFFIC 2
 DVD PROME | DVD PROME

Looks an awful lot like what you asked for in the beginning; just set the label_class to 1 (e.g. UPDATE asset.call_number SET label_class = 1 WHERE label_class = 2 AND label !~ '^\d').

Alternately, it would be pretty straightforward migrate prefixes out of the labels by populating asset.call_number_prefix, then setting asset.call_number.prefix to point to the prefix and stripping the prefix out of the label.

Revision history for this message
Dan Scott (denials) wrote :

And here's an example of how to migrate messed-up labels to use per-callnumber prefixes instead:

BEGIN;

-- Populate the call number prefixes
INSERT INTO asset.call_number_prefix (owning_lib, label)
(
  SELECT DISTINCT owning_lib, regexp_replace(label, '^([^\d]+)(\d{3}.*)$', trim('\1')) AS prefix
  FROM asset.call_number
  WHERE label_class = 2
    AND label ~ '^([^\d]+)\d{3}'
)
EXCEPT
(
  SELECT owning_lib, label
  FROM asset.call_number_prefix
);

-- Point at the actual prefixes and fix the labels
UPDATE asset.call_number
  SET prefix = acnp.id, label = regexp_replace(asset.call_number.label, '^([^\d]+)(\d{3}.*)$', trim('\2'))
  FROM asset.call_number_prefix acnp
  WHERE regexp_replace(asset.call_number.label, '^([^\d]+)(\d{3}.*)$', trim('\1')) = acnp.label
    AND label_class = 2
    AND asset.call_number.label ~ '^([^\d]+)\d{3}'
;

COMMIT;

That doesn't take care of the "DVD ABYSS" etc completely non-Dewey call number labels, but then you can address those with something like what I posted before. It would be easy to modify this approach to create & use shelving locations and their built-in prefix support, instead.

And note that when I said "co-locate", I did not mean physically, I meant by classification, so that you could browse all of the "Aquarium" books in the shelf browser -- whether they have a prefix of "NEW" or "Ref." or "YA" or "BIG PRINT" or whatever. See http://www.dlib.org/dlib/october04/dushay/10dushay.html for a classic article on the subject.

Revision history for this message
Jason Etheridge (phasefx) wrote :

The generic one also seems to work better for something like DVD 3,
which I hope doesn't exist out in the wild.

Revision history for this message
Jeremiah Miller (jeremym-t) wrote :
Download full text (4.2 KiB)

>So there's already a "Garbage" normalizer. It's called "Generic", and it produces the following sort order for you initial data:

Interesting. Yes, it appears that it does. That was actually my first inquiry before switching records to Dewey. Basically:

"The librarians tell me we are a Dewey library, yet all of our migrated callnumbers are set to Generic. While at the same time, the settings for new records are producing Dewey records. Is there a reason behind the old records being Generic? If so, is there a reason new records are not set the same way? If not, why aren't the old ones the same as the settings for the new ones?"

Appears the answer to that question should have been:

"Yes, they are Generic because you use prefixes with your Dewey numbers, and in Evergreen that is the proper classification for that type of numbering. Dewey is for pure Dewey, and will not sort your collection properly unless your prefixes are removed/migrated to a separate field. Change the new records to Generic, and the settings for new records to match. An alternative is to migrate your prefixes to a separate table, which is then linked to the record."

That wasn't the answer I got, so I headed down the wrong road. And judging from other conversations on the topic, we may not be the only ones using Dewey that might better be served by Generic.

Given that... is there something special to gain by using Dewey over Generic, that prompts folks to choose it instead? Or is it simple misunderstanding of what classification should be used for which circumstances?

> That doesn't take care of the "DVD ABYSS" etc completely non-Dewey call number labels, but then you can address those with something like what I posted before. It would be easy to modify this approach to create & use shelving locations and their built-in prefix support, instead.

Thank you for the code snippet, much appreciated. Will likely use when migrating/cleaning up prefixes comes to the fore of the plate. Though the "built-in prefix support" for shelving locations sounds very attractive, and doable once this separation is complete. Taking prefixes as found in callnums would surely give us garbage/typo based iterations of many of them. Would also take the opportunity to examine prefix usage, and perhaps eliminate many of them if possible. That being the part that actually takes the bulk of time & planning.

>And note that when I said "co-locate", I did not mean physically, I meant by classification, so that you could browse all of the "Aquarium" books in the shelf browser -- whether they have a prefix of "NEW" or "Ref." or "YA" or "BIG PRINT" or whatever.

Noted, and understood. I definitely see the benefits and attraction of that. I also can forsee the wailing and gnashing of teeth it might produce in my librarians so accustomed to the "other way"! Might take me another 10 years to get them to swallow that. Perhaps a user toggle in the OPAC would be a fantastic way to provide both. Except now I forsee programmers gnashing their teeth too. ;)

>See http://www.dlib.org/dlib/october04/dushay/10dushay.html for a classic article on the subject.

Excellent, thank you, will read.

N...

Read more...

Revision history for this message
Jason Etheridge (phasefx) wrote :

> However, Jason reports finding what appears to be a logic flaw in the
> code. Has written a patch that corrects it. A coincidental byproduct
> is that it ends up sorting the user's call numbers as desired, despite
> using wrong scheme. Not sure how I'd classify that. Also not qualified
> to evaluate efficiently, so bowing out of that discussion. (Jason,
> still fine to use it on our test system if needing data to try it on.)

Thanks for the offer.

For what it's worth, I'm getting this fixed in Koha; I just need to
write a test for it:
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=9770

If the consensus is that the Dewey classification should be very
strict in what it allows (and there have only been a handful of folks
weighing in on this ticket, so who knows), then the real bug is that
we're not strict here. Should non-Dewey call numbers fall to the end
of the list if using the Dewey classification? Maybe have a fallback
to Generic as a sub-sort?

Revision history for this message
Dan Wells (dbw2) wrote :

We don't use Dewey, so I have no personal stake in this. On the other hand, I have at times felt like I was part of a slim segment of users when pushing for a certain behavior of a feature, and in this particular case, I feel confident that this is a bug fix which will make Evergreen better.

As has been shown here, pure Dewey numbers sort fine without any normalization at all. That's part of the beauty of the system, I think. With that in mind, it makes sense that the main reason for a Dewey normalizer is to reasonably deal with the common but unofficial parts of Dewey call numbers.

Turning then to the code in question, it seems clear from the separation of alpha data and the author's code comments that the padding was meant for the first digit group, but that the code fails to check if the first group is alpha or digits. Jason's branch corrects this oversight, and I agree that this was the author's original intent.

That said, I also agree with Dan that a library would be better served by putting true prefixes in the prefix field. Doing so makes the data easier to sort, easier to analyze, and easier to maintain. Also, if any feature is generally not adopted, it won't get the development attention it deserves, and we are all forever stuck with "good enough" and never achieve the best possible.

Ben Shum (bshum)
Changed in evergreen:
milestone: 2.4.0-beta → 2.4.0-rc
Revision history for this message
Jeremiah Miller (jeremym-t) wrote :

On the current status of "incomplete", meaning I (as the reporter) need to give more info...

Is there more info that I could provide? Or should the status change to something else?

Revision history for this message
Ben Shum (bshum) wrote :

Discussed this bug briefly during the conference. Planning to reassess the contents and consider for merging a little later.

Changed in evergreen:
assignee: nobody → Ben Shum (bshum)
status: Incomplete → Confirmed
Ben Shum (bshum)
Changed in evergreen:
milestone: 2.4.0-rc → none
Revision history for this message
Ben Shum (bshum) wrote :

Picked to master, rel_2_4, and rel_2_3 as bug fix for repairing the logic used in Dewey sort.

Revision history for this message
Ben Shum (bshum) wrote :

Marking as "won't fix" for rel_2_2 given that we're outside that maintenance window.

Changed in evergreen:
milestone: none → 2.5.0-alpha2
status: Confirmed → Fix Committed
assignee: Ben Shum (bshum) → nobody
Ben Shum (bshum)
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.