Comment 15 for bug 684317

Revision history for this message
In , Andrzej (ndrwrdck) wrote :

Got some feedback from glibc bugzilla

1. They recommend using strxfrm for converting the string so that it matches strcoll ordering during simple comparison.

   However, strxfrm itself is pretty heavy, if we wanted "proper" sorting we could simply switch to using strcoll on all strings. So, my suggestion is to use the patch swapping 'a-z' for 'A-Z' maybe not the prettiest but it does 90% of strxfrm at almost 0 cost.

2. Weird ordering of Japanese characters and our workaround - apparently there are no Japanese language definitions in iso14651_t1_common file, which means they are ignored in the first pass and handled in the second one.

   They said that the "workaround" is indeed a correct way of using strcoll as there might be other ignored characters.

   There was no indication whether Japanese definition will be added to the iso14651_t1_common file but the bug was not closed so I imagine that still on the table.

My conclusion:
Current patches are doing as much as we can without sacrificing performance in ascii case (otherwise we could switch to strcoll completely). Other errors are mostly caused by limitations of strcoll in glibc (possibly will be resolved later).