wrong behavior of [:upper:] and/or [:lower:] if IGNORECASE is set and locale's mb_cur_max > 1

Bug #9027 reported by Debian Bug Importer
4
Affects Status Importance Assigned to Milestone
gawk (Debian)
Fix Released
Unknown
gawk (Ubuntu)
Invalid
High
Unassigned

Bug Description

Automatically imported from Debian bug report #276201 http://bugs.debian.org/276201

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Automatically imported from Debian bug report #276201 http://bugs.debian.org/276201

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Wed, 13 Oct 2004 00:55:10 +0900
From: Fumitoshi UKAI <email address hidden>
To: <email address hidden>
Subject: wrong behavior of [:upper:] and/or [:lower:] if IGNORECASE is set and locale's mb_cur_max >
 1

Package: gawk
Version: 1:3.1.4-1
Severity: grave
Tags: patch

On all locales that mb_cur_max > 1, such as CJK or UTF-8 locales,
[:upper:] and/or [:lower:] don't work as expected if IGNORECASE is set.

For example,
 % echo aaa | LANG=C gawk 'BEGIN { IGNORECASE=1 } /[[:upper:]]+/ { print }'
 aaa
 # correct, a matches [:upper:] when IGNORECASE=1

 % echo aaa | LANG=en_US.UTF-8 gawk 'BEGIN { IGNORECASE=1 } /[[:upper:]]+/ { print }'
 %
 # wrong, a doesn't match [:upper:] when IGNORECASE=1

If GAWK_NO_DFA=1, it works fine as well as LANG=C.

As I checked the source code, I found this chunks in
regcomp.c:build_charclass()
(the same code found in glibc)

  if ((syntax & RE_ICASE)
      && (strcmp (class_name, "upper") == 0 || strcmp (class_name, "lower") == 0))
    class_name = "alpha";

However, dfa.c doesn't do it the same way. So, this patch fixes this
problem:

--- dfa.c.orig 2004-10-12 19:38:48.000000000 +0900
+++ dfa.c 2004-10-12 19:38:11.000000000 +0900
@@ -596,6 +596,9 @@
   {
     wctype_t wt;
     /* Query the character class as wctype_t. */
+ if (case_fold && (strcmp(str, "upper") == 0 || strcmp(str, "lower") == 0)) {
+ strcpy(str, "alpha");
+ }
     wt = wctype (str);

     if (ch_classes_al == 0)

Regards,
Fumitoshi UKAI

Revision history for this message
Martin Pitt (pitti) wrote :

I checked this in several locales, not reproducible with warty's version of
gawk. Closing as NOTWARTY.

Revision history for this message
In , Fumitoshi UKAI (ukai) wrote : Fixed in NMU of gawk 1:3.1.4-1.1

tag 266519 + fixed
tag 276201 + fixed

quit

This message was generated automatically in response to a
non-maintainer upload. The .changes file follows.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.7
Date: Tue, 19 Oct 2004 01:16:27 +0900
Source: gawk
Binary: gawk
Architecture: source i386
Version: 1:3.1.4-1.1
Distribution: unstable
Urgency: low
Maintainer: James Troup <email address hidden>
Changed-By: Fumitoshi UKAI <email address hidden>
Description:
 gawk - GNU awk, a pattern scanning and processing language
Closes: 266519 276201
Changes:
 gawk (1:3.1.4-1.1) unstable; urgency=low
 .
   * NMU to fix RC bugs
   * 10_dfa.c-no-go_fast.dpatch: new patch by Fumitoshi UKAI
      to fix odd regexp matching in multibyte locales (UTF-8, CJK, ..)
      closes: Bug#266519
   * 11_dfa.c-ignorecase.dpatch: new patch by Fumitoshi UKAI
      to fix CASEIGNORE match on [:upper:] and [:lower:] in
      multibyte locales (UTF-8, CJK, ...)
      closes: Bug#276201
Files:
 47cdd14a4532a07d540cb6be156f0e22 557 interpreters optional gawk_3.1.4-1.1.dsc
 0e16583a1390c72b8ba73929466ce6df 9225 interpreters optional gawk_3.1.4-1.1.diff.gz
 a1a43961a3154a311aded33168c6cb1a 983300 interpreters optional gawk_3.1.4-1.1_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBc+029D5yZjzIjAkRAvxHAKC05uoZgw8msEe73szYw9FU12nxrgCgkWCe
B8rEeS5lv/Mw5rIPLqXfPWo=
=urId
-----END PGP SIGNATURE-----

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-Id: <email address hidden>
Date: Mon, 18 Oct 2004 12:47:03 -0400
From: Fumitoshi UKAI <email address hidden>
To: <email address hidden>
Cc: Fumitoshi UKAI <email address hidden>, James Troup <email address hidden>
Subject: Fixed in NMU of gawk 1:3.1.4-1.1

tag 266519 + fixed
tag 276201 + fixed

quit

This message was generated automatically in response to a
non-maintainer upload. The .changes file follows.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.7
Date: Tue, 19 Oct 2004 01:16:27 +0900
Source: gawk
Binary: gawk
Architecture: source i386
Version: 1:3.1.4-1.1
Distribution: unstable
Urgency: low
Maintainer: James Troup <email address hidden>
Changed-By: Fumitoshi UKAI <email address hidden>
Description:
 gawk - GNU awk, a pattern scanning and processing language
Closes: 266519 276201
Changes:
 gawk (1:3.1.4-1.1) unstable; urgency=low
 .
   * NMU to fix RC bugs
   * 10_dfa.c-no-go_fast.dpatch: new patch by Fumitoshi UKAI
      to fix odd regexp matching in multibyte locales (UTF-8, CJK, ..)
      closes: Bug#266519
   * 11_dfa.c-ignorecase.dpatch: new patch by Fumitoshi UKAI
      to fix CASEIGNORE match on [:upper:] and [:lower:] in
      multibyte locales (UTF-8, CJK, ...)
      closes: Bug#276201
Files:
 47cdd14a4532a07d540cb6be156f0e22 557 interpreters optional gawk_3.1.4-1.1.dsc
 0e16583a1390c72b8ba73929466ce6df 9225 interpreters optional gawk_3.1.4-1.1.diff.gz
 a1a43961a3154a311aded33168c6cb1a 983300 interpreters optional gawk_3.1.4-1.1_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBc+029D5yZjzIjAkRAvxHAKC05uoZgw8msEe73szYw9FU12nxrgCgkWCe
B8rEeS5lv/Mw5rIPLqXfPWo=
=urId
-----END PGP SIGNATURE-----

Revision history for this message
In , Fumitoshi UKAI (ukai) wrote : rc bug for sarge

# grep
tags 249245 - fixed
tags 249245 + sarge
tags 274352 - fixed
tags 274352 + sarge
tags 276202 - fixed
tags 276202 + sarge
tags 276209 - fixed
tags 276209 + sarge
# gawk
tags 266519 - fixed
tags 266519 + sarge
tags 276201 - fixed
tags 276201 + sarge
tags 276206 - fixed
tags 276206 + sarge
tags 277122 - fixed
tags 277122 + sarge
tags 264829 - fixed
tags 264829 + sarge
tags 266043 - fixed
tags 266043 + sarge
tags 271231 - fixed
tags 271231 + sarge

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Thu, 28 Oct 2004 12:04:42 +0900
From: Fumitoshi UKAI <email address hidden>
To: <email address hidden>
Subject: rc bug for sarge

# grep
tags 249245 - fixed
tags 249245 + sarge
tags 274352 - fixed
tags 274352 + sarge
tags 276202 - fixed
tags 276202 + sarge
tags 276209 - fixed
tags 276209 + sarge
# gawk
tags 266519 - fixed
tags 266519 + sarge
tags 276201 - fixed
tags 276201 + sarge
tags 276206 - fixed
tags 276206 + sarge
tags 277122 - fixed
tags 277122 + sarge
tags 264829 - fixed
tags 264829 + sarge
tags 266043 - fixed
tags 266043 + sarge
tags 271231 - fixed
tags 271231 + sarge

Revision history for this message
In , James Troup (james-nocrew) wrote : Bug#276201: fixed in gawk 1:3.1.4-2
Download full text (3.8 KiB)

Source: gawk
Source-Version: 1:3.1.4-2

We believe that the bug you reported is fixed in the latest version of
gawk, which is due to be installed in the Debian FTP archive:

gawk_3.1.4-2.diff.gz
  to pool/main/g/gawk/gawk_3.1.4-2.diff.gz
gawk_3.1.4-2.dsc
  to pool/main/g/gawk/gawk_3.1.4-2.dsc
gawk_3.1.4-2_i386.deb
  to pool/main/g/gawk/gawk_3.1.4-2_i386.deb

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed. If you
have further comments please address them to <email address hidden>,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
James Troup <email address hidden> (supplier of updated gawk package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing <email address hidden>)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.7
Date: Fri, 26 Nov 2004 18:30:42 +0000
Source: gawk
Binary: gawk
Architecture: source i386
Version: 1:3.1.4-2
Distribution: unstable
Urgency: low
Maintainer: James Troup <email address hidden>
Changed-By: James Troup <email address hidden>
Description:
 gawk - GNU awk, a pattern scanning and processing language
Closes: 263964 266519 276201 276206 277122 278135
Changes:
 gawk (1:3.1.4-2) unstable; urgency=low
 .
   * 14_io.c-fix-redirect-hang.dpatch: new patch which reverts io.c changes
     that wait() when a redirect hits EOF without checking whether or not
     this is the kind of redirect which would have an orphan to wait() on.
     Closes: #263964
 .
   * debian/control (Build-Depends): Add a versioned build-depends on a
     fixed binutils for m68k. Closes: #278135
 .
   * Merge in NMU changes. Many thanks to Fumitoshi UKAI. Closes:
     #276206, #277122, #266519, #276201
 .
   * 11_dfa.c-ignorecase.dpatch, 12_dfa.c-ignorecase-range.dpath,
     13_dfa.c-charclass-bracket.dpatch: revert to old-style dpatch patch so
     that it works for me.
 .
   * 10_dfa.c-no-go_fast.dpatch: replaced...
   * 10_dfa.c-disable-cache.dpatch: ... with this. Which is upstream's fix
     for the same problem.
 .
   * 15_builtin.c-fix-wide-char.dpatch: new patch by Stephen Kasal to fix
     wide-char to{lower,upper}() handling.
 .
   * 16_awkgram.y-stop-at-eof.dpatch: new patch by Andreas Schwab to stop
     gawk reading past the end of the file for an awk script that is big
     enough to fill more than a buffer's worth and does not end with a
     newline.
 .
   * 17_fix-non-numeric-constants.dpatch: new patch by Aharon Robbins to
     improve handling of non-numeric constants so that numbers like 00.34
     don't get confused as being octal.
Files:
 492e13079781d176c5b589d64bcaaedb 1221 interpreters optional gawk_3.1.4-2.dsc
 a175a8e9572d74150d3ff6072b4f64df 14896 interpreters optional gawk_3.1.4-2.diff.gz
 262ea208b69d0fb65d71b5cbb1708881 995324 interpreters optional gawk_3.1.4-2_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iQIVAwUBQad8NNfD8TGrKpH1AQIN2g/+PvhVX2LyNwKzjZK6q5gW2dZqyj+sgkHS
6YsNJPlGlnroFGnRi/mQwKPv0B2orTjRCbYrE4ROuuiEY8zl05S9jKGP...

Read more...

Revision history for this message
Debian Bug Importer (debzilla) wrote :
Download full text (4.0 KiB)

Message-Id: <email address hidden>
Date: Fri, 26 Nov 2004 14:02:14 -0500
From: James Troup <email address hidden>
To: <email address hidden>
Subject: Bug#276201: fixed in gawk 1:3.1.4-2

Source: gawk
Source-Version: 1:3.1.4-2

We believe that the bug you reported is fixed in the latest version of
gawk, which is due to be installed in the Debian FTP archive:

gawk_3.1.4-2.diff.gz
  to pool/main/g/gawk/gawk_3.1.4-2.diff.gz
gawk_3.1.4-2.dsc
  to pool/main/g/gawk/gawk_3.1.4-2.dsc
gawk_3.1.4-2_i386.deb
  to pool/main/g/gawk/gawk_3.1.4-2_i386.deb

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed. If you
have further comments please address them to <email address hidden>,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
James Troup <email address hidden> (supplier of updated gawk package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing <email address hidden>)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.7
Date: Fri, 26 Nov 2004 18:30:42 +0000
Source: gawk
Binary: gawk
Architecture: source i386
Version: 1:3.1.4-2
Distribution: unstable
Urgency: low
Maintainer: James Troup <email address hidden>
Changed-By: James Troup <email address hidden>
Description:
 gawk - GNU awk, a pattern scanning and processing language
Closes: 263964 266519 276201 276206 277122 278135
Changes:
 gawk (1:3.1.4-2) unstable; urgency=low
 .
   * 14_io.c-fix-redirect-hang.dpatch: new patch which reverts io.c changes
     that wait() when a redirect hits EOF without checking whether or not
     this is the kind of redirect which would have an orphan to wait() on.
     Closes: #263964
 .
   * debian/control (Build-Depends): Add a versioned build-depends on a
     fixed binutils for m68k. Closes: #278135
 .
   * Merge in NMU changes. Many thanks to Fumitoshi UKAI. Closes:
     #276206, #277122, #266519, #276201
 .
   * 11_dfa.c-ignorecase.dpatch, 12_dfa.c-ignorecase-range.dpath,
     13_dfa.c-charclass-bracket.dpatch: revert to old-style dpatch patch so
     that it works for me.
 .
   * 10_dfa.c-no-go_fast.dpatch: replaced...
   * 10_dfa.c-disable-cache.dpatch: ... with this. Which is upstream's fix
     for the same problem.
 .
   * 15_builtin.c-fix-wide-char.dpatch: new patch by Stephen Kasal to fix
     wide-char to{lower,upper}() handling.
 .
   * 16_awkgram.y-stop-at-eof.dpatch: new patch by Andreas Schwab to stop
     gawk reading past the end of the file for an awk script that is big
     enough to fill more than a buffer's worth and does not end with a
     newline.
 .
   * 17_fix-non-numeric-constants.dpatch: new patch by Aharon Robbins to
     improve handling of non-numeric constants so that numbers like 00.34
     don't get confused as being octal.
Files:
 492e13079781d176c5b589d64bcaaedb 1221 interpreters optional gawk_3.1.4-2.dsc
 a175a8e9572d74150d3ff6072b4f64df 14896 interpreters optional gawk_3.1.4-2.diff.gz
 262ea208b69d0fb65d71b5cbb1708881 995324 interpreters optional gawk...

Read more...

Changed in gawk:
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.