Ubuntu
software-center package

Please sort programs in a even more locale-friendly way (by using python-pyicu)

Bug #427568 reported by Michael Terry on 2009-09-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Ubuntu Translations	Triaged	Low	Unassigned
	software-center (Ubuntu)	Triaged	Low	Unassigned

Bug Description

Binary package hint: software-store

The listing of programs right now is sorted by name. But it seems to sort by character code (i.e. strcmp). You can see this falling down when there are accents in the program name.

I propose that it should instead sort using the unicode algorithm for sorting in a locale-sensitive way. Different languages have very different sorting rules.

I recently did this with the Ubuntu installer ubiquity, and found the python wrapper for libicu to be useful (python-pyicu). It can be used to generate collation keys for a given key in a given locale.

Revision history for this message

Michael Terry (mterry) wrote on 2009-09-10:

For testing purposes without switching your locale, the program name I noticed this on was Déjà Dup. It should sort between De and Df in English, but instead shows up between Dz and Ea.

Michael Vogt (mvo) on 2009-09-11

Changed in software-store (Ubuntu):
importance:	Undecided → Low
status:	New → Confirmed

Revision history for this message

Michael Vogt (mvo) wrote on 2009-09-11:

Thanks for your bugreport.

I used the stock python sort() implementation. I switched to strcoll now

Andrew (and471) on 2009-09-11

Changed in software-store (Ubuntu):
status:	Confirmed → Fix Committed

Andrew (and471) on 2009-09-12

Changed in software-store (Ubuntu):
status:	Fix Committed → Fix Released

Revision history for this message

Michael Terry (mterry) wrote on 2009-09-15:

I don't think using strcoll fixed anything. Again, look at the Accessories department and note the D's. You'll see Déjà Dup at the end, when it should be earlier (in English locale -- and I want to say all locales?).

(While looking to see which version I have to re-confirm this bug, I noted that Help->About gives a different version than apt does. About says 0.2.2, apt says 0.3.2.)

strcoll isn't a full Unicode collation algorithm (UCA) [1], AFAIK. I'm not really sure what possibilities there are for LC_COLLATE or how it works. But I do know that Python doesn't have a native implementation of UCA (surprisingly).

To fix this, you'll want to use a library that does implement it, like python-pyicu.

[1] http://unicode.org/reports/tr10/

Changed in software-store (Ubuntu):
status:	Fix Released → Confirmed

Revision history for this message

Matthew Paul Thomas (mpt) wrote on 2009-09-17:

Spec updated: '“Alphabetically” means case-insensitive and accent-insensitive. For example, “Déjà Dup Backup Utility” should be sorted between “Debian Documentation Browser” and “DeskScribe”.'
<https://wiki.ubuntu.com/SoftwareStore?action=diff&rev2=184&rev1=183>

(Michael Terry, the problem with the About window was bug 428677.)

Revision history for this message

Michael Terry (mterry) wrote on 2009-09-17:

mpt, that's a weak definition of 'alphabetically' for most non-Latin languages. When the package name translation contains non-Latin characters, what's the sorting plan? Even among romance languages, sorting is trickier than just stripping accents. From that Unicode report linked above:

"In French and a few other languages, however, it is the last accent difference that determines the order, as in row 2.
Normal Accent Ordering cote < coté < côte < côté
French Accent Ordering cote < côte < coté < côté"

Admittedly, that specific example is contrived, but you get my drift.

Again, my favored solution is this:
import PyICU, os
locale = os.environ['LANG'] # TODO: need to strip @ and .UTF-8
collator = PyICU.Collator.createInstance(PyICU.Locale(locale))
list.sort(key=lambda x: collator.getCollationKey(x).getByteArray())

python-pyicu is in main, on the CD.

Revision history for this message

Michael Vogt (mvo) wrote on 2009-09-18:

Hey Michael, thanks for your update.

I am no expert for unicode sorting myself, but I know that python uses wscoll internally for locale.strcoll(). So wide chars should be supported and strcoll should implement sorting based on the rules of the selected locale. For me, I see e.g. Déjà Dup sorted between Debian Documentation and DeskScribe (my locale is en_US) with the current version of software-store.

Attached is a small test program that tests the example for french you gave above. It seems to be sorted correctly with locale.strcoll() (unless I miss something). I'm fine with using python-pyicu of course, I just want to understand the issue first.

Revision history for this message

Michael Vogt (mvo) wrote on 2009-09-18:

small test program Edit (296 bytes, text/x-python)

Hey Michael, thanks for your update.

For me the test app prints:
unsorted: coté côte côté cote
normal: cote coté côte côté
strcoll: cote côte coté côté

That seems to match your example above.

Revision history for this message

Michael Vogt (mvo) wrote on 2009-09-18:

The function is wcscoll - I just checked the glibc source and it seems that the algorithm implemented there is iso14651. UCA and tr10 go beyond as explained in http://unicode.org/faq/collation.html#13 - So it seems to be worthwhile to use icu. I set this to medium priority.

Changed in software-store (Ubuntu):
importance:	Low → Medium
summary:	- Please sort programs in a locale-friendly way + Please sort programs in a even more locale-friendly way (by using + python-pyicu)

Revision history for this message

Michael Terry (mterry) wrote on 2009-09-18:

Ah, partly my mistake then. Deja Dup now appears in the right place for me.

I had tried a software-store 0.3.2 (which had your fix I thought) and I saw Deja Dup still at the end of the D's. I must have been mistaken about either whether your fix was in or where Deja Dup was.

But yeah, I guess UCA is slightly better and that would be a neat enhancement.

Revision history for this message

Matthew Paul Thomas (mpt) wrote on 2009-09-24:

#10

Michael Terry, can you suggest human-readable language with which I should specify the sort order in the specification? Would "sorted following the rules of Unicode Technical Standard #10" be accurate and precise enough?

Revision history for this message

Michael Terry (mterry) wrote on 2009-09-24:

#11

Calling it the "Unicode Collation Algorithm" (UCA) I believe is sufficient. UTS #10 sounds stuffy. :)

Revision history for this message

Matthew Paul Thomas (mpt) wrote on 2009-09-30:

#12

Ok, specification updated again. Thanks! <https://wiki.ubuntu.com/SoftwareCenter?action=diff&rev2=214&rev1=213>

Revision history for this message

Matthew Paul Thomas (mpt) wrote on 2010-02-19:

#13

"Déjà Dup" now correctly appears between "Debian Reference" and "DeskScribe" in the "Accessories" department. We should fix this properly eventually, but I'm going to mark it as Low importance unless/until there are some specific real-world examples of how the incorrect sorting makes programs hard to find. If you have some, please update the description. Thanks.

Changed in software-center (Ubuntu):
importance:	Medium → Low

Matthew Paul Thomas (mpt) on 2011-09-29

Changed in software-center (Ubuntu):
status:	Confirmed → Triaged

David Planella (dpm) on 2011-10-19

Changed in ubuntu-translations:
status:	New → Triaged
importance:	Undecided → Low

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

small test program Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntusoftware-center package

Please sort programs in a even more locale-friendly way (by using python-pyicu)

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
software-center package