Comment 2 for bug 447866

Revision history for this message
Paolo Bonzini (bonzini) wrote :

If you don't know the charset of the file, you should set the LANG or LC_CTYPE variables to "C":

$ echo $'AAAA\x88BBBB' | sed -e 's/[^0123456789]//g' | od -x
0000000 0a88
0000002
$ echo $'AAAA\x88BBBB' | LANG=C sed -e 's/[^0123456789]//g' | od -x
0000000 000a
0000001

This is different from Perl indeed:

$ echo $'AAAA\x88BBBB' | psed 's/[^0123456789]//g' | od -x
0000000 000a
0000001