sort -n -t, partially ignores field boundaries
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| coreutils (Ubuntu) |
New
|
Undecided
|
Unassigned | ||
Bug Description
ProblemType: Bug
ApportVersion: 2.14.1-0ubuntu3.19
Architecture: amd64
Date: Tue Mar 8 18:12:44 2016
Dependencies:
gcc-4.9-base 4.9.3-0ubuntu4
libacl1 2.2.52-1
libattr1 1:2.4.47-1ubuntu1
libc6 2.19-0ubuntu6.7
libgcc1 1:4.9.3-0ubuntu4
libpcre3 1:8.31-2ubuntu2.1
libselinux1 2.2.2-1ubuntu0.1
multiarch-support 2.19-0ubuntu6.6
DistroRelease: Ubuntu 14.04
Ec2AMI: ami-fce3c696
Ec2AMIManifest: (unknown)
Ec2Availability
Ec2InstanceType: m4.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: coreutils 8.21-1ubuntu5.3
PackageArchitec
ProcEnviron:
TERM=screen
SHELL=/bin/bash
PATH=(custom, user)
LANG=en_US.UTF-8
XDG_RUNTIME_
ProcVersionSign
SourcePackage: coreutils
Tags: trusty ec2-images
Uname: Linux 3.13.0-74-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
_MarkForUpload: True
$ cat tmp3
1,11111,1
207,970,60
807,120,600
$ sort -n -k2 -t, <tmp3
207,970,60
1,11111,1
807,120,600
$ sort -k2 -t, <tmp3
1,11111,1
807,120,600
207,970,60
Numeric sort places 970 before 120, I believe because it interprets field 2 as having the values 11111,1 and 970,60 and 120,600 which after comma removal becomes 111111 and 97060 and 120600.
Non-numeric sort places 120 before 970 but is not a numeric sort so 11111 appears before both of them. This is not a bug, and is simply mentioned for context.
Note that this general class of problem also occurs with sort -n -k2,3 -t,
Looking at an older implementation of sort (version 5.93 under osx), this problem happened back then - so it has been happening for quite a long time.
So basically the problem looks like a modularity violation in the implementation of numeric sort. So a fix will probably require re-implementing some part of that system.
Using sort -g instead of sort -n seems to work around the problem. But if this is somehow deemed to be not a bug in sort itself, it would still a bug in the manual page for sort (which does not mention or even hint at this issue).

It looks like LC_NUMERIC=C addresses this issue, but LC_NUMERIC is not mentioned in the man page.
Possibly it would be sufficient to mention LC_NUMERIC in the man page. (Though even locale(5) does not seem to adequately describe this env var).