Some letters break the category splitting

Bug #1422116 reported by Jellby on 2015-02-15
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Undecided
Charles Haley

Bug Description

I have authors whose name start with "A", among them "Aesop". If I change the spelling of the author-sort filed for Aesop to "Æsop", the author categories are split in "A", "Æ" and "A" again. The first "A" contains names before Æsop, the second "A" contains names after Æsop. I think the "Æ" category should be either inside or outside of "A" but the "A"s should not be split.

Changing the component for this bug.

 assignee cbhaley
 status triaged

Changed in calibre:
assignee: nobody → Charles Haley (cbhaley)
status: New → Triaged
Charles Haley (cbhaley) wrote :

@kovid: I can fix this, but I am not convinced that I should. The problem arises because when sorting, ICU considers Æ to the two characters AE, but when comparing Æ is a single letter. This means that Æ will always sort into its proper place in a group of items beginning with A.

Example: if the authors are displayed without categorization, the correct order is Aaa, Æa, Afa. If they are displayed with first letter categorization then the correct order is Aaa, Afa, Æa so that Æ is separated from the other As. I believe that get_categories should return the list in the non-categorized order, which means that if first-letter categorization is on then the tag browser must resort the list. Is it worth the performance penalty for a case that almost never happens?

Charles Haley (cbhaley) wrote :

@kovid: another possibility might be to do the "correct" sort in db get_categories. This would entail adding a parameter to get_categories telling it whether first letter grouping is enabled. This parameter must make it all the way to db.categories.sort_categories. Line 126 would become something like

    if first_letter_sorting:
        key=lambda x:(collation_order(x.sort), sort_key(x.sort))
    else:
        key=lambda x:sort_key(x.sort or x.name)

It isn't clear to me what would have to change to do the above. I think it would be sufficient to add a keyword parameter to db.cache.get_categories then change the tag browser model to use db.new_api.get_categories().

Charles Haley (cbhaley) wrote :

Ooops. The code would look something like

     if first_letter_sorting:
        key=lambda x:(collation_order(x.sort or x.name), sort_key(x.sort or x.name))
    else:
        key=lambda x:sort_key(x.sort or x.name)

Kovid Goyal (kovid) wrote :

Adding a keyword to new_api.get_categories is OK by me.

Charles Haley (cbhaley) on 2015-02-16
Changed in calibre:
status: Triaged → Fix Committed

Fixed in branch master. The fix will be in the next release. calibre is usually released every Friday.

 status fixreleased

Changed in calibre:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers