gawk crashes when given too big regex group index

Bug #364505 reported by Martin Olsson
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
gawk
Fix Released
Undecided
Unassigned
gawk (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

I have "gawk" version 1:3.1.6.dfsg-0ubuntu1 on jaunty RC.

Running:
echo "abc" | valgrind gawk '{ print gensub(/(.)b(.)/, "\\4", 1)}'

==20299== Invalid read of size 4
==20299== at 0x410ECC: (within /usr/bin/gawk)
==20299== by 0x4113E9: do_gensub (in /usr/bin/gawk)
==20299== by 0x43D4EB: r_tree_eval (in /usr/bin/gawk)
==20299== by 0x412EF1: do_print (in /usr/bin/gawk)
==20299== by 0x43BC05: interpret (in /usr/bin/gawk)
==20299== by 0x43B955: interpret (in /usr/bin/gawk)
==20299== by 0x428070: do_input (in /usr/bin/gawk)
==20299== by 0x429CE0: main (in /usr/bin/gawk)
==20299== Address 0x56481c8 is 0 bytes after a block of size 16 alloc'd
==20299== at 0x4C278AE: malloc (vg_replace_malloc.c:207)
==20299== by 0x439941: (within /usr/bin/gawk)
==20299== by 0x439C77: re_search (in /usr/bin/gawk)
==20299== by 0x42C120: research (in /usr/bin/gawk)
==20299== by 0x41055E: (within /usr/bin/gawk)
==20299== by 0x4113E9: do_gensub (in /usr/bin/gawk)
==20299== by 0x43D4EB: r_tree_eval (in /usr/bin/gawk)
==20299== by 0x412EF1: do_print (in /usr/bin/gawk)
==20299== by 0x43BC05: interpret (in /usr/bin/gawk)
==20299== by 0x43B955: interpret (in /usr/bin/gawk)
==20299== by 0x428070: do_input (in /usr/bin/gawk)
==20299== by 0x429CE0: main (in /usr/bin/gawk)
==20299==
==20299== Invalid read of size 4
==20299== at 0x410EDB: (within /usr/bin/gawk)
==20299== by 0x4113E9: do_gensub (in /usr/bin/gawk)
==20299== by 0x43D4EB: r_tree_eval (in /usr/bin/gawk)
==20299== by 0x412EF1: do_print (in /usr/bin/gawk)
==20299== by 0x43BC05: interpret (in /usr/bin/gawk)
==20299== by 0x43B955: interpret (in /usr/bin/gawk)
==20299== by 0x428070: do_input (in /usr/bin/gawk)
==20299== by 0x429CE0: main (in /usr/bin/gawk)
==20299== Address 0x5648208 is 0 bytes after a block of size 16 alloc'd
==20299== at 0x4C278AE: malloc (vg_replace_malloc.c:207)
==20299== by 0x43994E: (within /usr/bin/gawk)
==20299== by 0x439C77: re_search (in /usr/bin/gawk)
==20299== by 0x42C120: research (in /usr/bin/gawk)
==20299== by 0x41055E: (within /usr/bin/gawk)
==20299== by 0x4113E9: do_gensub (in /usr/bin/gawk)
==20299== by 0x43D4EB: r_tree_eval (in /usr/bin/gawk)
==20299== by 0x412EF1: do_print (in /usr/bin/gawk)
==20299== by 0x43BC05: interpret (in /usr/bin/gawk)
==20299== by 0x43B955: interpret (in /usr/bin/gawk)
==20299== by 0x428070: do_input (in /usr/bin/gawk)
==20299== by 0x429CE0: main (in /usr/bin/gawk)

Also see this one:
$ echo "[abc,def,ghi]" | gawk '{ print gensub(/([,\[])def([,\]])/, "\\4", 1)}'
Segmentation fault

Finally this one prints some libc memory corruption warning:
echo "[abc,def,ghi]" | gawk '{ print gensub(/([,\[])def([,\]])/, "\\8", 1)}'

*** glibc detected *** gawk: realloc(): invalid next size: 0x0000000001d2aae0 ***
======= Backtrace: =========
/lib/libc.so.6[0x7fc931a99cb8]
/lib/libc.so.6[0x7fc931a9df21]
/lib/libc.so.6(realloc+0x12e)[0x7fc931a9edae]
gawk[0x410a3e]
gawk(do_gensub+0x28a)[0x4113ea]
gawk(r_tree_eval+0x37c)[0x43d4ec]
gawk(do_print+0x102)[0x412ef2]
gawk(interpret+0x4e6)[0x43bc06]
gawk(interpret+0x236)[0x43b956]
gawk(do_input+0x41)[0x428071]
gawk(main+0xe01)[0x429ce1]
/lib/libc.so.6(__libc_start_main+0xe6)[0x7fc931a405a6]
gawk[0x406d79]
======= Memory map: ========
00400000-00454000 r-xp 00000000 08:02 2068367 /usr/bin/gawk
00654000-00655000 rw-p 00054000 08:02 2068367 /usr/bin/gawk
00655000-0065c000 rw-p 00655000 00:00 0
01d22000-01d43000 rw-p 01d22000 00:00 0 [heap]
7fc92c000000-7fc92c021000 rw-p 7fc92c000000 00:00 0
7fc92c021000-7fc930000000 ---p 7fc92c021000 00:00 0
7fc93180a000-7fc931820000 r-xp 00000000 08:02 10534973 /lib/libgcc_s.so.1
7fc931820000-7fc931a20000 ---p 00016000 08:02 10534973 /lib/libgcc_s.so.1
7fc931a20000-7fc931a21000 r--p 00016000 08:02 10534973 /lib/libgcc_s.so.1
7fc931a21000-7fc931a22000 rw-p 00017000 08:02 10534973 /lib/libgcc_s.so.1
7fc931a22000-7fc931b8a000 r-xp 00000000 08:02 10534951 /lib/libc-2.9.so
7fc931b8a000-7fc931d8a000 ---p 00168000 08:02 10534951 /lib/libc-2.9.so
7fc931d8a000-7fc931d8e000 r--p 00168000 08:02 10534951 /lib/libc-2.9.so
7fc931d8e000-7fc931d8f000 rw-p 0016c000 08:02 10534951 /lib/libc-2.9.so
7fc931d8f000-7fc931d94000 rw-p 7fc931d8f000 00:00 0
7fc931d94000-7fc931e18000 r-xp 00000000 08:02 10534984 /lib/libm-2.9.so
7fc931e18000-7fc932017000 ---p 00084000 08:02 10534984 /lib/libm-2.9.so
7fc932017000-7fc932018000 r--p 00083000 08:02 10534984 /lib/libm-2.9.so
7fc932018000-7fc932019000 rw-p 00084000 08:02 10534984 /lib/libm-2.9.so
7fc932019000-7fc93201b000 r-xp 00000000 08:02 10534965 /lib/libdl-2.9.so
7fc93201b000-7fc93221b000 ---p 00002000 08:02 10534965 /lib/libdl-2.9.so
7fc93221b000-7fc93221c000 r--p 00002000 08:02 10534965 /lib/libdl-2.9.so
7fc93221c000-7fc93221d000 rw-p 00003000 08:02 10534965 /lib/libdl-2.9.so
7fc93221d000-7fc93223d000 r-xp 00000000 08:02 10534931 /lib/ld-2.9.so
7fc9322f3000-7fc9323de000 r--p 00000000 08:02 2098048 /usr/lib/locale/en_DK.utf8/LC_COLLATE
7fc9323de000-7fc93241d000 r--p 00000000 08:02 2098049 /usr/lib/locale/en_DK.utf8/LC_CTYPE
7fc93241d000-7fc93241f000 rw-p 7fc93241d000 00:00 0
7fc93242f000-7fc932430000 r--p 00000000 08:02 2097585 /usr/lib/locale/en_DK.utf8/LC_TIME
7fc932430000-7fc932431000 r--p 00000000 08:02 2097584 /usr/lib/locale/en_DK.utf8/LC_NUMERIC
7fc932431000-7fc932432000 r--p 00000000 08:02 2097587 /usr/lib/locale/en_DK.utf8/LC_MESSAGES/SYS_LC_MESSAGES
7fc932432000-7fc932439000 r--s 00000000 08:02 13664499 /usr/lib/gconv/gconv-modules.cache
7fc932439000-7fc93243c000 rw-p 7fc932439000 00:00 0
7fc93243c000-7fc93243d000 r--p 0001f000 08:02 10534931 /lib/ld-2.9.so
7fc93243d000-7fc93243e000 rw-p 00020000 08:02 10534931 /lib/ld-2.9.so
7fff3a428000-7fff3a43d000 rw-p 7ffffffea000 00:00 0 [stack]
7fff3a5fe000-7fff3a5ff000 r-xp 7fff3a5fe000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Aborted

Revision history for this message
Martin Olsson (mnemo) wrote :

I reported this bug upstream as well and they immediately suggested a potential fix:

Index: ChangeLog
===================================================================
RCS file: /d/mongo/cvsrep/gawk-stable/ChangeLog,v
retrieving revision 1.101
diff -u -r1.101 ChangeLog
--- ChangeLog 16 Apr 2009 20:02:25 -0000 1.101
+++ ChangeLog 22 Apr 2009 04:43:41 -0000
@@ -1,3 +1,11 @@
+Wed Apr 22 07:42:05 2009 Arnold D. Robbins <email address hidden>
+
+ * builtin.c (sub_common): In code for handling \<dig> replacements,
+ first make sure that <dig> is within the range of parentheses sets
+ given, and then make sure that the subpattern start is not -1, meaning
+ that something actually matched. Thanks to Martin Olsson
+ <email address hidden> for the bug report.
+
 Thu Apr 16 22:59:32 2009 Arnold D. Robbins <email address hidden>

  * eval.c (func_call): Save nloops_active; if after function returns
Index: builtin.c
===================================================================
RCS file: /d/mongo/cvsrep/gawk-stable/builtin.c,v
retrieving revision 1.31
diff -u -r1.31 builtin.c
--- builtin.c 27 Mar 2009 08:01:13 -0000 1.31
+++ builtin.c 22 Apr 2009 04:40:15 -0000
@@ -2544,15 +2544,17 @@
      if (backdigs) { /* gensub, behave sanely */
       if (ISDIGIT(scan[1])) {
        int dig = scan[1] - '0';
- char *start, *end;
+ if (dig < NUMSUBPATS(rp, t->stptr) && SUBPATSTART(rp, tp->stptr, dig) != -1) {
+ char *start, *end;

- start = t->stptr
- + SUBPATSTART(rp, t->stptr, dig);
- end = t->stptr
- + SUBPATEND(rp, t->stptr, dig);
-
- for (cp = start; cp < end; cp++)
- *bp++ = *cp;
+ start = t->stptr
+ + SUBPATSTART(rp, t->stptr, dig);
+ end = t->stptr
+ + SUBPATEND(rp, t->stptr, dig);
+
+ for (cp = start; cp < end; cp++)
+ *bp++ = *cp;
+ }
        scan++;
       } else /* \q for any q --> q */
        *bp++ = *++scan;

This fix is not yet checked in (and I'm not sure this will be the final fix), let's keep an eye on the upstream changelog:
http://cvs.savannah.gnu.org/viewvc/gawk-stable/ChangeLog?root=gawk&view=log

Hopefully this bug will be fixed upstream and a new release will be packaged for karmic (the gawk package was never updated for jaunty).

Revision history for this message
Martin Olsson (mnemo) wrote :
Revision history for this message
Dave Walker (dogatemycomputer) wrote :

Thanks for reporting this bug and any supporting documentation. Since this bug has enough information provided for a developer to begin work, I'm going to mark it as confirmed and let them handle it from here. Thanks for taking the time to make Ubuntu better!

Changed in gawk (Ubuntu):
status: New → Confirmed
status: Confirmed → Fix Committed
status: Fix Committed → Confirmed
Changed in gawk:
status: New → Confirmed
Jeroen Schot (schot)
Changed in gawk (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Jeroen Schot (schot) wrote :

This bug was fixed in upstream release 3.1.7. Fix is in Ubuntu since Maverick, which contains 1:3.1.7.dfsg-5.

Changed in gawk:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.