sort -n -t, partially ignores field boundaries

Bug #1554647 reported by Raul Miller
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
coreutils (Ubuntu)
New
Undecided
Unassigned

Bug Description

ProblemType: Bug
ApportVersion: 2.14.1-0ubuntu3.19
Architecture: amd64
Date: Tue Mar 8 18:12:44 2016
Dependencies:
 gcc-4.9-base 4.9.3-0ubuntu4
 libacl1 2.2.52-1
 libattr1 1:2.4.47-1ubuntu1
 libc6 2.19-0ubuntu6.7
 libgcc1 1:4.9.3-0ubuntu4
 libpcre3 1:8.31-2ubuntu2.1
 libselinux1 2.2.2-1ubuntu0.1
 multiarch-support 2.19-0ubuntu6.6
DistroRelease: Ubuntu 14.04
Ec2AMI: ami-fce3c696
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-east-1d
Ec2InstanceType: m4.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
Package: coreutils 8.21-1ubuntu5.3
PackageArchitecture: amd64
ProcEnviron:
 TERM=screen
 SHELL=/bin/bash
 PATH=(custom, user)
 LANG=en_US.UTF-8
 XDG_RUNTIME_DIR=<set>
ProcVersionSignature: User Name 3.13.0-74.118-generic 3.13.11-ckt30
SourcePackage: coreutils
Tags: trusty ec2-images
Uname: Linux 3.13.0-74-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
_MarkForUpload: True

$ cat tmp3
1,11111,1
207,970,60
807,120,600
$ sort -n -k2 -t, <tmp3
207,970,60
1,11111,1
807,120,600
$ sort -k2 -t, <tmp3
1,11111,1
807,120,600
207,970,60

Numeric sort places 970 before 120, I believe because it interprets field 2 as having the values 11111,1 and 970,60 and 120,600 which after comma removal becomes 111111 and 97060 and 120600.

Non-numeric sort places 120 before 970 but is not a numeric sort so 11111 appears before both of them. This is not a bug, and is simply mentioned for context.

Note that this general class of problem also occurs with sort -n -k2,3 -t,

Looking at an older implementation of sort (version 5.93 under osx), this problem happened back then - so it has been happening for quite a long time.

So basically the problem looks like a modularity violation in the implementation of numeric sort. So a fix will probably require re-implementing some part of that system.

Using sort -g instead of sort -n seems to work around the problem. But if this is somehow deemed to be not a bug in sort itself, it would still a bug in the manual page for sort (which does not mention or even hint at this issue).

Revision history for this message
Raul Miller (raul-miller) wrote :

It looks like LC_NUMERIC=C addresses this issue, but LC_NUMERIC is not mentioned in the man page.

Possibly it would be sufficient to mention LC_NUMERIC in the man page. (Though even locale(5) does not seem to adequately describe this env var).

Revision history for this message
Raul Miller (raul-miller) wrote :

... is not mentioned in the sort(1) man page...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.