Long text crash dcigettext.c with segfault

Bug #1922646 reported by Xinmeng Xia
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
New
Undecided
Unassigned

Bug Description

Bug description:
Module locale of CPython interpreter use dcigettext.c. When locale.dgettext() is filled with long text, it will cause crashes. (Short text is fine)

======================================================
Python 3.10.0a6 (default, Mar 19 2021, 11:45:56) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale;locale.dgettext('abs'*10000000,'')
Segmentation fault (core dumped)

Testing with valgrind:
======================================================
~$ PYTHONMALLOC=malloc_debug valgrind python
Memcheck, a memory error detector
==4870== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4870== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==4870== Command: /home/xxm/Desktop/apifuzz/Python-3.10.0a6/python
==4870==
Python 3.10.0a6 (default, Mar 19 2021, 11:45:56) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.dgettext('abs'*10000000,'')
==4870== Warning: client switching stacks? SP change: 0x1ffefff5c0 --> 0x1ffd363220
==4870== to suppress, use: --max-stackframe=30000032 or greater
==4870== Invalid write of size 8
==4870== at 0x5797E88: __dcigettext (dcigettext.c:675)
==4870== Address 0x1ffd363218 is on thread 1's stack
==4870==
==4870==
==4870== Process terminating with default action of signal 11 (SIGSEGV)
==4870== Access not within mapped region at address 0x1FFD363218
==4870== at 0x5797E88: __dcigettext (dcigettext.c:675)
==4870== If you believe this happened as a result of a stack
==4870== overflow in your program's main thread (unlikely but
==4870== possible), you can try to increase the size of the
==4870== main thread stack using the --main-stacksize= flag.
==4870== The main thread stack size used in this run was 8388608.
==4870== Invalid write of size 8
==4870== at 0x4A2867A: _vgnU_freeres (vg_preloaded.c:57)
==4870== Address 0x1ffd363210 is on thread 1's stack
==4870==
==4870==
==4870== Process terminating with default action of signal 11 (SIGSEGV)
==4870== Access not within mapped region at address 0x1FFD363210
==4870== at 0x4A2867A: _vgnU_freeres (vg_preloaded.c:57)
==4870== If you believe this happened as a result of a stack
==4870== overflow in your program's main thread (unlikely but
==4870== possible), you can try to increase the size of the
==4870== main thread stack using the --main-stacksize= flag.
==4870== The main thread stack size used in this run was 8388608.
==4870==
==4870== HEAP SUMMARY:
==4870== in use at exit: 35,310,749 bytes in 35,706 blocks
==4870== total heap usage: 87,221 allocs, 51,515 frees, 44,733,752 bytes allocated
==4870==
==4870== LEAK SUMMARY:
==4870== definitely lost: 0 bytes in 0 blocks
==4870== indirectly lost: 0 bytes in 0 blocks
==4870== possibly lost: 35,173,680 bytes in 34,899 blocks
==4870== still reachable: 137,069 bytes in 807 blocks
==4870== suppressed: 0 bytes in 0 blocks
==4870== Rerun with --leak-check=full to see details of leaked memory
==4870==
==4870== For lists of detected and suppressed errors, rerun with: -s
==4870== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

Testing with gdb
======================================================
$gdb ./python
(gdb) run
>>> locale.dgettext('abs'*10000000,'')

Program received signal SIGSEGV, Segmentation fault.
__dcigettext (
    domainname=domainname@entry=0xadb030 "absabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsab"..., msgid1=msgid1@entry=0x7ffff7fc09a0 "", msgid2=msgid2@entry=0x0,
    plural=plural@entry=0, n=n@entry=0, category=category@entry=5) at dcigettext.c:675
675 dcigettext.c: No such file or directory.
(gdb)

======================================================

ProblemType: Crash

$ldd --version
Ubuntu GLIBC 2.23-0ubuntu11.2) 2.23

$uname -a
Linux xxm 4.15.0-64-generic #73~16.04.1-Ubunut SMP Fri Sep 13, UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :

The first argument to dgettext() is the translation domain. Why would you pass an absurdly long string as the domain?

Try this instead:

$ python3
Python 3.8.6 (default, Jan 27 2021, 15:42:20)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> myvar = locale.dgettext('git', 'abs'*10000000)
>>> quit()

Changed in glibc (Ubuntu):
status: New → Incomplete
Revision history for this message
Xinmeng Xia (xinmengxia) wrote :

Thank you for your kindly explanations. Yes, I aggree. if the domain takes short string, it will work well. But if the domain takes long string, it may lead to segfault. We develop a fuzzing tool to test functions in Python standard library. The fuzzing tool return long string for the domains. That's how we found this bug. It is probably a potential threat. I think it is better to add a checker here for the length of the domain to ensure the robustness. Would you fix it?

Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :

Well, I'm not the one who would fix anything. I just jumped in since I thought you had made a mistake. Leaving it to the glibc maintainers to evaluate the importance of your observation.

Changed in glibc (Ubuntu):
status: Incomplete → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.