buffer overrun in repr() for unicode strings
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Python |
Unknown
|
Unknown
|
|||
python2.4 (Ubuntu) |
Fix Released
|
Medium
|
Matthias Klose | ||
Dapper |
Fix Released
|
High
|
Martin Pitt |
Bug Description
hi,
i discovered a bug yesterday in repr() for unicode strings. this
causes an unpatched non-debug wide (UTF-32/UCS-4) build of python to
abort:
python2.4 -c 'assert(
(repr(u"\U00010000" * 39 + u"\uffff" * 4096))'
the problem is fixed by a change to unicodeobject.c. in the process of
fixing it i also found and fixed another bug in repr() on UCS-4 python
builds -- previously paired unicode surrogates were being repr()'ed as a
single "character" even though they are not treated as such by a UCS-4
python build -- i.e. eval(repr(
an unpatched UCS-4 build.
Package: python2.4
Version: 2.4.3-7ubuntu2
Severity: important
when i run this command:
python -c "repr(u'
\U0001d56b\
\U0001d6e7Z\
\U0001d7e7\
python aborts with the following backtrace and memory dump:
*** glibc detected *** python: realloc(): invalid next size: 0x081521e8
***
======= Backtrace: =========
/lib/tls/
/lib/tls/
python(
python[0x80991f7]
python(
python(
python(
python(
python(
python(
python(
/lib/tls/
python[0x8055041]
======= Memory map: ========
08048000-0811a000 r-xp 00000000 08:03 622736 /usr/bin/python2.4
0811a000-0813b000 rw-p 000d1000 08:03 622736 /usr/bin/python2.4
0813b000-081b5000 rw-p 0813b000 00:00 0 [heap]
b7c00000-b7c21000 rw-p b7c00000 00:00 0
b7c21000-b7d00000 ---p b7c21000 00:00 0
b7d40000-b7d4a000 r-xp 00000000 08:03 376899 /lib/libgcc_s.so.1
b7d4a000-b7d4b000 rw-p 00009000 08:03 376899 /lib/libgcc_s.so.1
b7d68000-b7d9b000 r--p 00000000 08:03
82634 /usr/lib/
b7d9b000-b7d9e000 r-xp 00000000 08:03
625529 /usr/lib/
b7d9e000-b7d9f000 rw-p 00003000 08:03
625529 /usr/lib/
b7d9f000-b7e22000 rw-p b7d9f000 00:00 0
b7e22000-b7f51000 r-xp 00000000 08:03
66543 /lib/tls/
b7f51000-b7f53000 r--p 0012e000 08:03
66543 /lib/tls/
b7f53000-b7f55000 rw-p 00130000 08:03
66543 /lib/tls/
b7f55000-b7f58000 rw-p b7f55000 00:00 0
b7f58000-b7f7c000 r-xp 00000000 08:03
66547 /lib/tls/
b7f7c000-b7f7e000 rw-p 00023000 08:03
66547 /lib/tls/
b7f7e000-b7f80000 r-xp 00000000 08:03
68161 /lib/tls/
b7f80000-b7f82000 rw-p 00001000 08:03
68161 /lib/tls/
b7f82000-b7f83000 rw-p b7f82000 00:00 0
b7f83000-b7f85000 r-xp 00000000 08:03
66546 /lib/tls/
b7f85000-b7f87000 rw-p 00001000 08:03
66546 /lib/tls/
b7f87000-b7f96000 r-xp 00000000 08:03
68156 /lib/tls/
b7f96000-b7f98000 rw-p 0000f000 08:03
68156 /lib/tls/
b7f98000-b7f9a000 rw-p b7f98000 00:00 0
b7fb0000-b7fb7000 r--s 00000000 08:03
2130015 /usr/lib/
b7fb7000-b7fb9000 rw-p b7fb7000 00:00 0
b7fb9000-b7fd2000 r-xp 00000000 08:03 2737127 /lib/ld-2.4.so
b7fd2000-b7fd4000 rw-p 00018000 08:03 2737127 /lib/ld-2.4.so
bf99b000-bf9b3000 rw-p bf99b000 00:00 0 [stack]
ffffe000-fffff000 ---p 00000000 00:00 0 [vdso]
Aborted
-- System Information:
Debian Release: testing/unstable
APT prefers edgy-updates
APT policy: (500, 'edgy-updates'), (500, 'edgy-security'), (500,
'edgy-backports'), (500, 'edgy')
Architecture: i386 (i686)
Shell: /bin/sh linked to /bin/dash
Kernel: Linux 2.6.17-5-386
Locale: LANG=en_US.UTF-8, LC_CTYPE=
Versions of packages python2.4 depends on:
ii libbz2-1.0 1.0.3-3 high-quality block-sorting
file co
ii libc6 2.4-1ubuntu8 GNU C Library: Shared
libraries
ii libdb4.4 4.4.20-6 Berkeley v4.4 Database
Libraries [
ii libncurses5 5.5-2ubuntu1 Shared libraries for
terminal hand
ii libncursesw5 5.5-2ubuntu1 Shared libraries for
terminal hand
ii libreadline5 5.1-7build1 GNU readline and history
libraries
ii libssl0.9.8 0.9.8b-2build1 SSL shared libraries
ii mime-support 3.36-1 MIME files 'mime.types' &
'mailcap
ii python2.4-minimal 2.4.3-7ubuntu2 A minimal subset of the
Python lan
python2.4 recommends no packages.
-- no debconf information
the patch is online here:
http://
and also inlined here and attached to this message:
--- Objects/
+++ /home/bsittler/
2006-08-16 12:37:19.000000000 -0700
@@ -1968,7 +1968,29 @@
static const char *hexdigit = "0123456789abcdef";
- repr = PyString_
+ /* Initial allocation is based on the longest-possible unichr
+ escape.
+
+ In wide (UTF-32) builds '\U00xxxxxx' is 10 chars per source
+ unichr, so in this case it's the longest unichr escape. In
+ narrow (UTF-16) builds this is five chars per source unichr
+ since there are two unichrs in the surrogate pair, so in narrow
+ (UTF-16) builds it's not the longest unichr escape.
+
+ In wide or narrow builds '\uxxxx' is 6 chars per source unichr,
+ so in the narrow (UTF-16) build case it's the longest unichr
+ escape.
+
+ */
+
+ repr = PyString_
+ 2
+#ifdef Py_UNICODE_WIDE
+ + 10*size
+#else
+ + 6*size
+#endif
+ + 1);
if (repr == NULL)
return NULL;
@@ -1993,15 +2015,6 @@
#ifdef Py_UNICODE_WIDE
/* Map 21-bit characters to '\U00xxxxxx' */
else if (ch >= 0x10000) {
- int offset = p - PyString_
-
- /* Resize the string if necessary */
- if (offset + 12 > PyString_
- if (_PyString_
- return NULL;
- p = PyString_
- }
-
*p++ = '\\';
*p++ = 'U';
*p++ = hexdigit[(ch >> 28) & 0x0000000F];
@@ -2014,8 +2027,8 @@
*p++ = hexdigit[ch & 0x0000000F];
}
-#endif
- /* Map UTF-16 surrogate pairs to Unicode \UXXXXXXXX escapes */
+#else
+ /* Map UTF-16 surrogate pairs to '\U00xxxxxx' */
else if (ch >= 0xD800 && ch < 0xDC00) {
Py_UCS4 ucs;
@@ -2040,6 +2053,7 @@
s--;
size++;
}
+#endif
/* Map 16-bit characters to '\uxxxx' */
if (ch >= 256) {
Hi Benjamin,
Have you sent this upstream to the Python bug tracker on SourceForge?
If not, I'd suggest doing this so that they can merge it in. If you'd like
I can also send this up as well, but it seems like you'd like to be the one
to get credit for your patch.
Thanks.