sorting fails on Japanese Unicode characters

Bug #1248239 reported by Rolf Leggewie
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
bash (Ubuntu)
New
Undecided
Unassigned
coreutils (Ubuntu)
New
Undecided
Unassigned

Bug Description

there seems to be some oddity in interpreting some Japanese unicode characters in bash and how they should be sorted.

$ ls -1 /tmp/*.txt
/tmp/⑥-test.txt
/tmp/⑤-test.txt
/tmp/④-test.txt
/tmp/①-test.txt
/tmp/③-test.txt
/tmp/②-test.txt
$ ls -1 /tmp/*.txt|sort
/tmp/⑥-test.txt
/tmp/⑤-test.txt
/tmp/④-test.txt
/tmp/①-test.txt
/tmp/③-test.txt
/tmp/②-test.txt

This is while booted into an uptodate precise system.

Tags: precise
Revision history for this message
Rolf Leggewie (r0lf) wrote :

assigning to bash and coreutils packages for now for confirmation and triage

affects: ubuntu → bash (Ubuntu)
Revision history for this message
Fumihito YOSHIDA (hito) wrote :

'sort' command does not support "human graspable sorting" in unicode environments.
http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

I suggest solution, sort with "LC_ALL=C" variables (or set in your aliases).

This is bad spec, but this is unalterable by historical reason.

$ ls -1 | sort



②-test.txt
⑤-test.txt
④-test.txt
⑥-test.txt
①-test.txt
③-test.txt

$ ls -1| LC_ALL=C sort

①-test.txt

②-test.txt

③-test.txt
④-test.txt
⑤-test.txt
⑥-test.txt

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.