sed stops replacing when reaching a special character

Bug #447866 reported by lovinglinux
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sed (Ubuntu)
New
Undecided
Unassigned

Bug Description

Binary package hint: sed

When filtering a large file (~600.000 lines) with sed, it stops replacing if it encounters a special character.

For example, when using the regular expression below to remove all characters except numbers:

sed -e 's/[^0123456789]//g'

and if the file contains the following line:

AAAAAüBBBBBB999

the output is:

ü999

instead of:

999

When using the regular expression below to remove all characters before the numbers:

sed -e 's/.*999/Range:/g'

the output is:

AAAAAü999

instead of:

999

It only happens with files containing a large number of lines.

I have applied the same regular expression filtering to the same file with perl and the output is perfect.

ProblemType: Bug
Architecture: i386
Date: Sat Oct 10 05:49:02 2009
DistroRelease: Ubuntu 9.10
NonfreeKernelModules: nvidia
Package: sed 4.2.1-1
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-12.41-generic
SourcePackage: sed
Uname: Linux 2.6.31-12-generic i686

Revision history for this message
lovinglinux (lovinglinux) wrote :
Revision history for this message
Paolo Bonzini (bonzini) wrote :

If you don't know the charset of the file, you should set the LANG or LC_CTYPE variables to "C":

$ echo $'AAAA\x88BBBB' | sed -e 's/[^0123456789]//g' | od -x
0000000 0a88
0000002
$ echo $'AAAA\x88BBBB' | LANG=C sed -e 's/[^0123456789]//g' | od -x
0000000 000a
0000001

This is different from Perl indeed:

$ echo $'AAAA\x88BBBB' | psed 's/[^0123456789]//g' | od -x
0000000 000a
0000001

Paolo Bonzini (bonzini)
Changed in sed (Ubuntu):
status: New → Invalid
Revision history for this message
lovinglinux (lovinglinux) wrote :

I don't see why it should be considered invalid, so I'm changing the status back to new.

Changed in sed (Ubuntu):
status: Invalid → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.