find regex does not work properly

Bug #585648 reported by dafintrash
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
findutils (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: findutils

I would appear to be a similar bug to #58883
Some have said this isn't a bug at all but here goes.
When in an empty directory using gnome-terminal:

user@ubuntukarmic:~/f/Desktop/test$ touch {A..Z}
user@ubuntukarmic:~/f/Desktop/test$ touch {a..z}
user@ubuntukarmic:~/f/Desktop/test$ ls
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
user@ubuntukarmic:~/f/Desktop/test$ find . \! -regex '.*[A-Z].*' | sort
.
./a
user@ubuntukarmic:~/f/Desktop/test$ env | grep -i lang
LANG=en_US.UTF-8
GDM_LANG=en_US.UTF-8
user@ubuntukarmic:~/f/Desktop/test$ LANG=C
user@ubuntukarmic:~/f/Desktop/test$ find . \! -regex '.*[A-Z].*' | sort
.
./a
./b
./c
./d
./e
./f
./g
./h
./i
./j
./k
./l
./m
./n
./o
./p
./q
./r
./s
./t
./u
./v
./w
./x
./y
./z
user@ubuntukarmic:~/f/Desktop/test$ find -version
find (GNU findutils) 4.4.2
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Eric B. Decker, James Youngman, and Kevin Dalley.
Built using GNU gnulib version e5573b1bad88bfabcda181b9e0125fb0c52b7d3b
Features enabled: D_TYPE O_NOFOLLOW(enabled) LEAF_OPTIMISATION FTS() CBO(level=0)

This is further discussed at http://ubuntuforums.org/showthread.php?t=1488843

Tags: find findutils
Revision history for this message
Andreas Metzler (k-launchpad-downhill-at-eu-org) wrote :

Hello,

1) I cannot reproduce this on current Debian.
2) The respective code is not located in find, re_match() is part of libc.
3) The fact that regex are locale dependent is expected behavior, e.g. in the Estonian alphabet Z is not the last letter and therefore e.g Y is not in 'A-Z'.
4) To matching upper case letters you should use the respect collation sequence ([[:upper:]] instead of [A-Z]) or reset LC_COLLATE to C.

Given all that, afaict from Google it looks like some in some versions of libc '[A-Z]' includes lower case letters in en_US.UTF-8 locale while in others it does not. See also #120687

Revision history for this message
Stefan Wagner (wagner-stefan) wrote :

Not a bug, see comment #1.

Changed in findutils (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.