Ubuntu
gawk package

Bug #9234
Comment #4

Comment 4 for bug 9234

Revision history for this message

Debian Bug Importer (debzilla) wrote on 2004-10-19:

Message-ID: <email address hidden>
Date: Wed, 13 Oct 2004 01:23:31 +0900
From: Fumitoshi UKAI <email address hidden>
To: <email address hidden>
Subject: range of characters doesn't match as expected if IGNORECASE is set and locale's mb_cur_max
> 1

Package: gawk
Version: 1:3.1.4-1

On all locales that mb_cur_max > 1, such as CJK or UTF-8 locales,
[a-a] doesn't match with A as expected if IGNORECASE is set.

For example,
% echo A | LANG=C gawk 'BEGIN { IGNORECASE = 1} /[a-a]+/{print}'
A

% echo A | LANG=en_US.UTF-8 gawk 'BEGIN { IGNORECASE = 1} /[a-a]+/{print}'
%
# wrong, A should match [a-a] when IGNORECASE=1

If GAWK_NO_DFA=1, it works fine as well as LANG=C.
% echo A | GAWK_NO_DFA=1 LANG=en_US.UTF-8 gawk 'BEGIN { IGNORECASE = 1} /[a-a]+/{print}'
A
%

Note that [a-z] will match with A, that is not because IGNORECASE works,
but because collation order in UTF-8 is "a A b B .. z".
That is, [a-z] won't match with Z even if IGNORECASE=1.

% echo Z | LANG=en_US.UTF-8 gawk 'BEGIN { IGNORECASE = 1} /[a-z]+/{print}'
%

Regards,
Fumitoshi UKAI

Ubuntugawk package

Comment 4 for bug 9234

Ubuntu
gawk package