sed stops replacing when reaching a special character

Bug #447866 reported by lovinglinux on 2009-10-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sed (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: sed

When filtering a large file (~600.000 lines) with sed, it stops replacing if it encounters a special character.

For example, when using the regular expression below to remove all characters except numbers:

sed -e 's/[^0123456789]//g'

and if the file contains the following line:

AAAAAüBBBBBB999

the output is:

ü999

instead of:

999

When using the regular expression below to remove all characters before the numbers:

sed -e 's/.*999/Range:/g'

the output is:

AAAAAü999

instead of:

999

It only happens with files containing a large number of lines.

I have applied the same regular expression filtering to the same file with perl and the output is perfect.

ProblemType: Bug
Architecture: i386
Date: Sat Oct 10 05:49:02 2009
DistroRelease: Ubuntu 9.10
NonfreeKernelModules: nvidia
Package: sed 4.2.1-1
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-12.41-generic
SourcePackage: sed
Uname: Linux 2.6.31-12-generic i686

lovinglinux (lovinglinux) wrote :
Paolo Bonzini (bonzini) wrote :

If you don't know the charset of the file, you should set the LANG or LC_CTYPE variables to "C":

$ echo $'AAAA\x88BBBB' | sed -e 's/[^0123456789]//g' | od -x
0000000 0a88
0000002
$ echo $'AAAA\x88BBBB' | LANG=C sed -e 's/[^0123456789]//g' | od -x
0000000 000a
0000001

This is different from Perl indeed:

$ echo $'AAAA\x88BBBB' | psed 's/[^0123456789]//g' | od -x
0000000 000a
0000001

Paolo Bonzini (bonzini) on 2010-02-12
Changed in sed (Ubuntu):
status: New → Invalid
lovinglinux (lovinglinux) wrote :

I don't see why it should be considered invalid, so I'm changing the status back to new.

Changed in sed (Ubuntu):
status: Invalid → New
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers