Long text crash dcigettext.c with segfault

Bug #1922646 reported by Xinmeng Xia on 2021-04-06
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
Undecided
Unassigned

Bug Description

Bug description:
Module locale of CPython interpreter use dcigettext.c. When locale.dgettext() is filled with long text, it will cause crashes. (Short text is fine)

======================================================
Python 3.10.0a6 (default, Mar 19 2021, 11:45:56) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale;locale.dgettext('abs'*10000000,'')
Segmentation fault (core dumped)

Testing with valgrind:
======================================================
~$ PYTHONMALLOC=malloc_debug valgrind python
Memcheck, a memory error detector
==4870== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4870== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==4870== Command: /home/xxm/Desktop/apifuzz/Python-3.10.0a6/python
==4870==
Python 3.10.0a6 (default, Mar 19 2021, 11:45:56) [GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.dgettext('abs'*10000000,'')
==4870== Warning: client switching stacks? SP change: 0x1ffefff5c0 --> 0x1ffd363220
==4870== to suppress, use: --max-stackframe=30000032 or greater
==4870== Invalid write of size 8
==4870== at 0x5797E88: __dcigettext (dcigettext.c:675)
==4870== Address 0x1ffd363218 is on thread 1's stack
==4870==
==4870==
==4870== Process terminating with default action of signal 11 (SIGSEGV)
==4870== Access not within mapped region at address 0x1FFD363218
==4870== at 0x5797E88: __dcigettext (dcigettext.c:675)
==4870== If you believe this happened as a result of a stack
==4870== overflow in your program's main thread (unlikely but
==4870== possible), you can try to increase the size of the
==4870== main thread stack using the --main-stacksize= flag.
==4870== The main thread stack size used in this run was 8388608.
==4870== Invalid write of size 8
==4870== at 0x4A2867A: _vgnU_freeres (vg_preloaded.c:57)
==4870== Address 0x1ffd363210 is on thread 1's stack
==4870==
==4870==
==4870== Process terminating with default action of signal 11 (SIGSEGV)
==4870== Access not within mapped region at address 0x1FFD363210
==4870== at 0x4A2867A: _vgnU_freeres (vg_preloaded.c:57)
==4870== If you believe this happened as a result of a stack
==4870== overflow in your program's main thread (unlikely but
==4870== possible), you can try to increase the size of the
==4870== main thread stack using the --main-stacksize= flag.
==4870== The main thread stack size used in this run was 8388608.
==4870==
==4870== HEAP SUMMARY:
==4870== in use at exit: 35,310,749 bytes in 35,706 blocks
==4870== total heap usage: 87,221 allocs, 51,515 frees, 44,733,752 bytes allocated
==4870==
==4870== LEAK SUMMARY:
==4870== definitely lost: 0 bytes in 0 blocks
==4870== indirectly lost: 0 bytes in 0 blocks
==4870== possibly lost: 35,173,680 bytes in 34,899 blocks
==4870== still reachable: 137,069 bytes in 807 blocks
==4870== suppressed: 0 bytes in 0 blocks
==4870== Rerun with --leak-check=full to see details of leaked memory
==4870==
==4870== For lists of detected and suppressed errors, rerun with: -s
==4870== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

Testing with gdb
======================================================
$gdb ./python
(gdb) run
>>> locale.dgettext('abs'*10000000,'')

Program received signal SIGSEGV, Segmentation fault.
__dcigettext (
    domainname=domainname@entry=0xadb030 "absabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsabsab"..., msgid1=msgid1@entry=0x7ffff7fc09a0 "", msgid2=msgid2@entry=0x0,
    plural=plural@entry=0, n=n@entry=0, category=category@entry=5) at dcigettext.c:675
675 dcigettext.c: No such file or directory.
(gdb)

======================================================

ProblemType: Crash

$ldd --version
Ubuntu GLIBC 2.23-0ubuntu11.2) 2.23

$uname -a
Linux xxm 4.15.0-64-generic #73~16.04.1-Ubunut SMP Fri Sep 13, UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Gunnar Hjalmarsson (gunnarhj) wrote :

The first argument to dgettext() is the translation domain. Why would you pass an absurdly long string as the domain?

Try this instead:

$ python3
Python 3.8.6 (default, Jan 27 2021, 15:42:20)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> myvar = locale.dgettext('git', 'abs'*10000000)
>>> quit()

Changed in glibc (Ubuntu):
status: New → Incomplete
Xinmeng Xia (xinmengxia) wrote :

Thank you for your kindly explanations. Yes, I aggree. if the domain takes short string, it will work well. But if the domain takes long string, it may lead to segfault. We develop a fuzzing tool to test functions in Python standard library. The fuzzing tool return long string for the domains. That's how we found this bug. It is probably a potential threat. I think it is better to add a checker here for the length of the domain to ensure the robustness. Would you fix it?

Gunnar Hjalmarsson (gunnarhj) wrote :

Well, I'm not the one who would fix anything. I just jumped in since I thought you had made a mistake. Leaving it to the glibc maintainers to evaluate the importance of your observation.

Changed in glibc (Ubuntu):
status: Incomplete → New
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers