all character classes match "[" in utf-8 locales

Bug #9236 reported by Debian Bug Importer
4
Affects Status Importance Assigned to Milestone
gawk (Debian)
Fix Released
Unknown
gawk (Ubuntu)
Invalid
High
Unassigned

Bug Description

Automatically imported from Debian bug report #277122 http://bugs.debian.org/277122

Revision history for this message
In , Fumitoshi UKAI (ukai) wrote : critical bugs in multibyte locales(UTF-8, CJK, ..) regexp

severity 249245 grave
severity 274352 grave
severity 226397 grave
severity 276209 grave
merge 249245 226397 238167
severity 277122 grave
severity 276206 grave
thanks

Bug#249245 can be fixed by patch derived from gawk's dfa.c.
Bug#274352 can be fixed by 1 line patch.
Bug#277122 (in gawk dfa.c) is the same bugs as Bug#274352 (in grep dfa.c).
Bug#276209 (in grep) and Bug#276206 (in gawk) is the same bug in dfa.c about
case insensitivity of character ranges.

All of these bugs break behaviour in multibyte locales (UTF-8, CJK, ..)

Regards,
Fumitoshi UKAI

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Automatically imported from Debian bug report #277122 http://bugs.debian.org/277122

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Tue, 19 Oct 2004 02:54:15 +0900
From: Fumitoshi UKAI <email address hidden>
To: <email address hidden>
Subject: all character classes match "[" in utf-8 locales

Package: gawk
Version: 1:3.1.4-1
Severity: important
Tags: patch

This is the same bug as Bug#274352 in grep.

 % echo '[' | LANG=en_US.UTF-8 gawk '/[[:space:]]/ { print }'
 [
 %

This can be fixed by this patch, as well as grep.

--- dfa.c~ 2004-10-19 01:18:31.000000000 +0900
+++ dfa.c 2004-10-19 02:53:28.000000000 +0900
@@ -645,7 +645,7 @@
         work_mbc->coll_elems[work_mbc->ncoll_elems++] = elem;
       }
    }
- wc = WEOF;
+ wc = wc1 = WEOF;
      }
    else
      /* We treat '[' as a normal character here. */

Regards,
Fumitoshi UKAI

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Tue, 19 Oct 2004 11:30:56 +0900
From: Fumitoshi UKAI <email address hidden>
To: <email address hidden>
Subject: critical bugs in multibyte locales(UTF-8, CJK, ..) regexp

severity 249245 grave
severity 274352 grave
severity 226397 grave
severity 276209 grave
merge 249245 226397 238167
severity 277122 grave
severity 276206 grave
thanks

Bug#249245 can be fixed by patch derived from gawk's dfa.c.
Bug#274352 can be fixed by 1 line patch.
Bug#277122 (in gawk dfa.c) is the same bugs as Bug#274352 (in grep dfa.c).
Bug#276209 (in grep) and Bug#276206 (in gawk) is the same bug in dfa.c about
case insensitivity of character ranges.

All of these bugs break behaviour in multibyte locales (UTF-8, CJK, ..)

Regards,
Fumitoshi UKAI

Revision history for this message
Martin Pitt (pitti) wrote :

Warty's version of gawk behaves correctly.

Revision history for this message
In , Fumitoshi UKAI (ukai) wrote : Fixed in NMU of gawk 1:3.1.4-1.2

tag 276206 + fixed
tag 277122 + fixed

quit

This message was generated automatically in response to a
non-maintainer upload. The .changes file follows.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.7
Date: Wed, 20 Oct 2004 01:41:40 +0900
Source: gawk
Binary: gawk
Architecture: source i386
Version: 1:3.1.4-1.2
Distribution: unstable
Urgency: low
Maintainer: James Troup <email address hidden>
Changed-By: Fumitoshi UKAI <email address hidden>
Description:
 gawk - GNU awk, a pattern scanning and processing language
Closes: 276206 277122
Changes:
 gawk (1:3.1.4-1.2) unstable; urgency=low
 .
   * NMU to fix RC bugs
   * 12_dfa.c-ignorecase-range.dpath: new patch by Fumitoshi UKAI
     to fix CASEIGNORE match on [a-z] or [A-Z] in multibyte locales (UTF-8,.)
     closes: Bug#276206
   * 13_dfa.c-charclass-bracket.dpatch: new patch by Fumitoshi UKAI
     to fix wrong match '[' against character class such as [[:space:]]
     in multibyte locales (UTF-8, ...)
     closes: Bug#277122
Files:
 f9efc0ef141272744a0f5845b2058f89 558 interpreters optional gawk_3.1.4-1.2.dsc
 afc7be320bfb12299feaa7f6988a0080 10013 interpreters optional gawk_3.1.4-1.2.diff.gz
 a70f8b4ad65c18f95c91f5e82bc1284c 983574 interpreters optional gawk_3.1.4-1.2_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBdUSv9D5yZjzIjAkRAkwcAJ9NpaJ19cxRUx0xIOK8pU4N7ZqAwACeKG+F
+kxtBcbg3aPT8pTEzirDa+Y=
=vhfA
-----END PGP SIGNATURE-----

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-Id: <email address hidden>
Date: Tue, 19 Oct 2004 13:02:11 -0400
From: Fumitoshi UKAI <email address hidden>
To: <email address hidden>
Cc: Fumitoshi UKAI <email address hidden>, James Troup <email address hidden>
Subject: Fixed in NMU of gawk 1:3.1.4-1.2

tag 276206 + fixed
tag 277122 + fixed

quit

This message was generated automatically in response to a
non-maintainer upload. The .changes file follows.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.7
Date: Wed, 20 Oct 2004 01:41:40 +0900
Source: gawk
Binary: gawk
Architecture: source i386
Version: 1:3.1.4-1.2
Distribution: unstable
Urgency: low
Maintainer: James Troup <email address hidden>
Changed-By: Fumitoshi UKAI <email address hidden>
Description:
 gawk - GNU awk, a pattern scanning and processing language
Closes: 276206 277122
Changes:
 gawk (1:3.1.4-1.2) unstable; urgency=low
 .
   * NMU to fix RC bugs
   * 12_dfa.c-ignorecase-range.dpath: new patch by Fumitoshi UKAI
     to fix CASEIGNORE match on [a-z] or [A-Z] in multibyte locales (UTF-8,.)
     closes: Bug#276206
   * 13_dfa.c-charclass-bracket.dpatch: new patch by Fumitoshi UKAI
     to fix wrong match '[' against character class such as [[:space:]]
     in multibyte locales (UTF-8, ...)
     closes: Bug#277122
Files:
 f9efc0ef141272744a0f5845b2058f89 558 interpreters optional gawk_3.1.4-1.2.dsc
 afc7be320bfb12299feaa7f6988a0080 10013 interpreters optional gawk_3.1.4-1.2.diff.gz
 a70f8b4ad65c18f95c91f5e82bc1284c 983574 interpreters optional gawk_3.1.4-1.2_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBdUSv9D5yZjzIjAkRAkwcAJ9NpaJ19cxRUx0xIOK8pU4N7ZqAwACeKG+F
+kxtBcbg3aPT8pTEzirDa+Y=
=vhfA
-----END PGP SIGNATURE-----

Revision history for this message
In , Fumitoshi UKAI (ukai) wrote : rc bug for sarge

# grep
tags 249245 - fixed
tags 249245 + sarge
tags 274352 - fixed
tags 274352 + sarge
tags 276202 - fixed
tags 276202 + sarge
tags 276209 - fixed
tags 276209 + sarge
# gawk
tags 266519 - fixed
tags 266519 + sarge
tags 276201 - fixed
tags 276201 + sarge
tags 276206 - fixed
tags 276206 + sarge
tags 277122 - fixed
tags 277122 + sarge
tags 264829 - fixed
tags 264829 + sarge
tags 266043 - fixed
tags 266043 + sarge
tags 271231 - fixed
tags 271231 + sarge

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Thu, 28 Oct 2004 12:04:42 +0900
From: Fumitoshi UKAI <email address hidden>
To: <email address hidden>
Subject: rc bug for sarge

# grep
tags 249245 - fixed
tags 249245 + sarge
tags 274352 - fixed
tags 274352 + sarge
tags 276202 - fixed
tags 276202 + sarge
tags 276209 - fixed
tags 276209 + sarge
# gawk
tags 266519 - fixed
tags 266519 + sarge
tags 276201 - fixed
tags 276201 + sarge
tags 276206 - fixed
tags 276206 + sarge
tags 277122 - fixed
tags 277122 + sarge
tags 264829 - fixed
tags 264829 + sarge
tags 266043 - fixed
tags 266043 + sarge
tags 271231 - fixed
tags 271231 + sarge

Revision history for this message
In , James Troup (james-nocrew) wrote : Bug#277122: fixed in gawk 1:3.1.4-2
Download full text (3.8 KiB)

Source: gawk
Source-Version: 1:3.1.4-2

We believe that the bug you reported is fixed in the latest version of
gawk, which is due to be installed in the Debian FTP archive:

gawk_3.1.4-2.diff.gz
  to pool/main/g/gawk/gawk_3.1.4-2.diff.gz
gawk_3.1.4-2.dsc
  to pool/main/g/gawk/gawk_3.1.4-2.dsc
gawk_3.1.4-2_i386.deb
  to pool/main/g/gawk/gawk_3.1.4-2_i386.deb

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed. If you
have further comments please address them to <email address hidden>,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
James Troup <email address hidden> (supplier of updated gawk package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing <email address hidden>)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.7
Date: Fri, 26 Nov 2004 18:30:42 +0000
Source: gawk
Binary: gawk
Architecture: source i386
Version: 1:3.1.4-2
Distribution: unstable
Urgency: low
Maintainer: James Troup <email address hidden>
Changed-By: James Troup <email address hidden>
Description:
 gawk - GNU awk, a pattern scanning and processing language
Closes: 263964 266519 276201 276206 277122 278135
Changes:
 gawk (1:3.1.4-2) unstable; urgency=low
 .
   * 14_io.c-fix-redirect-hang.dpatch: new patch which reverts io.c changes
     that wait() when a redirect hits EOF without checking whether or not
     this is the kind of redirect which would have an orphan to wait() on.
     Closes: #263964
 .
   * debian/control (Build-Depends): Add a versioned build-depends on a
     fixed binutils for m68k. Closes: #278135
 .
   * Merge in NMU changes. Many thanks to Fumitoshi UKAI. Closes:
     #276206, #277122, #266519, #276201
 .
   * 11_dfa.c-ignorecase.dpatch, 12_dfa.c-ignorecase-range.dpath,
     13_dfa.c-charclass-bracket.dpatch: revert to old-style dpatch patch so
     that it works for me.
 .
   * 10_dfa.c-no-go_fast.dpatch: replaced...
   * 10_dfa.c-disable-cache.dpatch: ... with this. Which is upstream's fix
     for the same problem.
 .
   * 15_builtin.c-fix-wide-char.dpatch: new patch by Stephen Kasal to fix
     wide-char to{lower,upper}() handling.
 .
   * 16_awkgram.y-stop-at-eof.dpatch: new patch by Andreas Schwab to stop
     gawk reading past the end of the file for an awk script that is big
     enough to fill more than a buffer's worth and does not end with a
     newline.
 .
   * 17_fix-non-numeric-constants.dpatch: new patch by Aharon Robbins to
     improve handling of non-numeric constants so that numbers like 00.34
     don't get confused as being octal.
Files:
 492e13079781d176c5b589d64bcaaedb 1221 interpreters optional gawk_3.1.4-2.dsc
 a175a8e9572d74150d3ff6072b4f64df 14896 interpreters optional gawk_3.1.4-2.diff.gz
 262ea208b69d0fb65d71b5cbb1708881 995324 interpreters optional gawk_3.1.4-2_i386.deb

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iQIVAwUBQad8NNfD8TGrKpH1AQIN2g/+PvhVX2LyNwKzjZK6q5gW2dZqyj+sgkHS
6YsNJPlGlnroFGnRi/mQwKPv0B2orTjRCbYrE4ROuuiEY8zl05S9jKGP...

Read more...

Revision history for this message
Debian Bug Importer (debzilla) wrote :
Download full text (4.0 KiB)

Message-Id: <email address hidden>
Date: Fri, 26 Nov 2004 14:02:14 -0500
From: James Troup <email address hidden>
To: <email address hidden>
Subject: Bug#277122: fixed in gawk 1:3.1.4-2

Source: gawk
Source-Version: 1:3.1.4-2

We believe that the bug you reported is fixed in the latest version of
gawk, which is due to be installed in the Debian FTP archive:

gawk_3.1.4-2.diff.gz
  to pool/main/g/gawk/gawk_3.1.4-2.diff.gz
gawk_3.1.4-2.dsc
  to pool/main/g/gawk/gawk_3.1.4-2.dsc
gawk_3.1.4-2_i386.deb
  to pool/main/g/gawk/gawk_3.1.4-2_i386.deb

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed. If you
have further comments please address them to <email address hidden>,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
James Troup <email address hidden> (supplier of updated gawk package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing <email address hidden>)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 1.7
Date: Fri, 26 Nov 2004 18:30:42 +0000
Source: gawk
Binary: gawk
Architecture: source i386
Version: 1:3.1.4-2
Distribution: unstable
Urgency: low
Maintainer: James Troup <email address hidden>
Changed-By: James Troup <email address hidden>
Description:
 gawk - GNU awk, a pattern scanning and processing language
Closes: 263964 266519 276201 276206 277122 278135
Changes:
 gawk (1:3.1.4-2) unstable; urgency=low
 .
   * 14_io.c-fix-redirect-hang.dpatch: new patch which reverts io.c changes
     that wait() when a redirect hits EOF without checking whether or not
     this is the kind of redirect which would have an orphan to wait() on.
     Closes: #263964
 .
   * debian/control (Build-Depends): Add a versioned build-depends on a
     fixed binutils for m68k. Closes: #278135
 .
   * Merge in NMU changes. Many thanks to Fumitoshi UKAI. Closes:
     #276206, #277122, #266519, #276201
 .
   * 11_dfa.c-ignorecase.dpatch, 12_dfa.c-ignorecase-range.dpath,
     13_dfa.c-charclass-bracket.dpatch: revert to old-style dpatch patch so
     that it works for me.
 .
   * 10_dfa.c-no-go_fast.dpatch: replaced...
   * 10_dfa.c-disable-cache.dpatch: ... with this. Which is upstream's fix
     for the same problem.
 .
   * 15_builtin.c-fix-wide-char.dpatch: new patch by Stephen Kasal to fix
     wide-char to{lower,upper}() handling.
 .
   * 16_awkgram.y-stop-at-eof.dpatch: new patch by Andreas Schwab to stop
     gawk reading past the end of the file for an awk script that is big
     enough to fill more than a buffer's worth and does not end with a
     newline.
 .
   * 17_fix-non-numeric-constants.dpatch: new patch by Aharon Robbins to
     improve handling of non-numeric constants so that numbers like 00.34
     don't get confused as being octal.
Files:
 492e13079781d176c5b589d64bcaaedb 1221 interpreters optional gawk_3.1.4-2.dsc
 a175a8e9572d74150d3ff6072b4f64df 14896 interpreters optional gawk_3.1.4-2.diff.gz
 262ea208b69d0fb65d71b5cbb1708881 995324 interpreters optional gawk...

Read more...

Changed in gawk:
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.