Comment 1 for bug 9234

Revision history for this message
In , Fumitoshi UKAI (ukai) wrote :

tags 276206 + patch
thanks

At Wed, 13 Oct 2004 01:23:31 +0900,
Fumitoshi UKAI wrote:

> On all locales that mb_cur_max > 1, such as CJK or UTF-8 locales,
> [a-a] doesn't match with A as expected if IGNORECASE is set.
>
> For example,
> % echo A | LANG=C gawk 'BEGIN { IGNORECASE = 1} /[a-a]+/{print}'
> A
>
> % echo A | LANG=en_US.UTF-8 gawk 'BEGIN { IGNORECASE = 1} /[a-a]+/{print}'
> %
> # wrong, A should match [a-a] when IGNORECASE=1
>
> If GAWK_NO_DFA=1, it works fine as well as LANG=C.
> % echo A | GAWK_NO_DFA=1 LANG=en_US.UTF-8 gawk 'BEGIN { IGNORECASE = 1} /[a-a]+/{print}'
> A
> %
>
> Note that [a-z] will match with A, that is not because IGNORECASE works,
> but because collation order in UTF-8 is "a A b B .. z".
> That is, [a-z] won't match with Z even if IGNORECASE=1.
>
> % echo Z | LANG=en_US.UTF-8 gawk 'BEGIN { IGNORECASE = 1} /[a-z]+/{print}'
> %

I think this patch fixes this problem:

--- dfa.c.orig 2004-10-13 02:27:29.000000000 +0900
+++ dfa.c 2004-10-13 02:27:54.000000000 +0900
@@ -682,6 +682,28 @@
    REALLOC_IF_NECESSARY(work_mbc->range_ends, wchar_t,
           range_ends_al, work_mbc->nranges + 1);
    work_mbc->range_ends[work_mbc->nranges++] = (wchar_t)wc2;
+ if (case_fold && (iswlower((wint_t)wc) || iswupper((wint_t)wc))
+ && (iswlower((wint_t)wc2) || iswupper((wint_t)wc2))) {
+ wint_t altcase;
+ altcase = wc;
+ if (iswlower((wint_t)wc))
+ altcase = towupper((wint_t)wc);
+ else
+ altcase = towlower((wint_t)wc);
+ REALLOC_IF_NECESSARY(work_mbc->range_sts, wchar_t,
+ range_sts_al, work_mbc->nranges + 1);
+ work_mbc->range_sts[work_mbc->nranges] = (wchar_t)altcase;
+
+ altcase = wc2;
+ if (iswlower((wint_t)wc2))
+ altcase = towupper((wint_t)wc2);
+ else
+ altcase = towlower((wint_t)wc2);
+ REALLOC_IF_NECESSARY(work_mbc->range_ends, wchar_t,
+ range_ends_al, work_mbc->nranges + 1);
+ work_mbc->range_ends[work_mbc->nranges++] = (wchar_t)altcase;
+
+ }
  }
       else if (wc != WEOF)
  /* build normal characters. */

Regards,
Fumitoshi UKAI