wrong stack size calculation

Bug #1351148 reported by Vlad Lesin on 2014-08-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL Server
Unknown
Unknown
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
High
Vlad Lesin
5.5
Fix Released
High
Vlad Lesin
5.6
Fix Released
High
Vlad Lesin

Bug Description

The following analysis made one of our users:

gdb tells us that the core was generated by a SIGSEGV, with the following stack trace:

(gdb) where
#0 msort_with_tmp (p=0x7ef5064122e0, b=0x7ef506412388, n=2) at msort.c:41
#1 0x00007f2b6dbf25b7 in msort_with_tmp (n=2, b=0x7ef506412388, p=0x7ef5064122e0) at msort.c:46
#2 msort_with_tmp (p=0x7ef5064122e0, b=0x7ef506412380, n=3) at msort.c:55
#3 0x00007f2b6dbf25b7 in msort_with_tmp (n=3, b=0x7ef506412380, p=0x7ef5064122e0) at msort.c:46
#4 msort_with_tmp (p=0x7ef5064122e0, b=0x7ef506412370, n=5) at msort.c:55
#5 0x00007f2b6dbf25b7 in msort_with_tmp (n=5, b=0x7ef506412370, p=0x7ef5064122e0) at msort.c:46
#6 msort_with_tmp (p=0x7ef5064122e0, b=0x7ef506412350, n=9) at msort.c:55
#7 0x00007f2b6dbf25a1 in msort_with_tmp (n=9, b=0x7ef506412350, p=0x7ef5064122e0) at msort.c:46
#8 msort_with_tmp (p=0x7ef5064122e0, b=0x7ef506412350, n=18) at msort.c:54
#9 0x00007f2b6dbf2abb in msort_with_tmp (p=0x7ef5064122e0, n=18, b=0x7ef506412350) at msort.c:46
#10 qsort_r (b=0x7ef506412350, n=18, s=<optimized out>, cmp=0x7f2b707eee10, arg=0x0) at msort.c:298
#11 0x00007f2b707ef201 in ?? ()
#12 0x00007f2b707f0152 in ?? ()
#13 0x00007f2b707f0211 in ?? ()
#14 0x00007f2b707f066a in lf_hash_insert ()

First thing we thought was weird that it was crashing in qsort, which is part of glibc. Diving into the assembly, we can see what instruction generated the core:

(gdb) disas
Dump of assembler code for function msort_with_tmp:
0x00007f2b6dbf2560 <+0>: push %r15
0x00007f2b6dbf2562 <+2>: push %r14
0x00007f2b6dbf2564 <+4>: push %r13
0x00007f2b6dbf2566 <+6>: mov %rdx,%r13
0x00007f2b6dbf2569 <+9>: shr %r13
0x00007f2b6dbf256c <+12>: push %r12
0x00007f2b6dbf256e <+14>: mov %rdx,%r12
0x00007f2b6dbf2571 <+17>: sub %r13,%r12
0x00007f2b6dbf2574 <+20>: push %rbp
0x00007f2b6dbf2575 <+21>: push %rbx
0x00007f2b6dbf2576 <+22>: mov %r13,%rbx
0x00007f2b6dbf2579 <+25>: sub $0x38,%rsp
0x00007f2b6dbf257d <+29>: imul (%rdi),%rbx
=> 0x00007f2b6dbf2581 <+33>: mov %rdi,0x18(%rsp)
0x00007f2b6dbf2586 <+38>: mov %rsi,0x20(%rsp)
0x00007f2b6dbf258b <+43>: mov %rdx,0x28(%rsp)

What this tells us is that the faulting instruction was mov %rdi,0x18(%rsp), which is trying to write the contents of the rdi register to offset 0x18 of the stack pointer. This is where things start to get interesting. The value of rsp is 0x7ef506411fe0:

(gdb) p $rsp
$1 = (void *) 0x7ef506411fe0

This is interesting because the value is near the end of the physical memory page (ends with fe0 -- we use 4k pages). Using readelf, we can dump out the memory mapping of the process, which is contained in the core file. What we're looking for is the mapping that contains the address 0x7ef506411fe0:

scott@db4:/data/mysql_ssd/core$ readelf -a core.12846
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
... lots of output ...
LOAD 0x000000037b88a000 0x00007ef506411000 0x0000000000000000
0x0000000000001000 0x0000000000001000 1000
LOAD 0x000000037b88b000 0x00007ef506412000 0x0000000000000000
0x0000000000040000 0x0000000000040000 RW 1000
LOAD 0x000000037b8cb000 0x00007ef506452000 0x0000000000000000
0x0000000000001000 0x0000000000001000 1000
...

What this tells us is the mapping that contains 0x7ef506411fe0 is 0x1000 bytes (4k - or one page) and that it is NOT writable.

This is what caused this particular core. The stack pointer (rsp) is now pointing outside of the stack and into the guard page. When we try to write to the guard page it does the right thing and SIGSEGV's.

So, how does this happen? The offending function is _lf_pinbox_real_free. The problem here is the call to alloca on line 355 is overflowing the stack. We can see that alloca_size is being calculated on line 350 (sizeof(void *) * LF_PINBOX_PINS(4) * npins). The key here is to figure out where in memory pinbox is. By disassembling the function and understanding the x64 calling convention, we can figure this out. I'm not going to go into detail, but I was able to figure out that the value of npins is 7444. So, npins * 4 * 8 = 238208 bytes being allocated with alloca(!). This is a significant amount of stack space and this can be confirmed by looking at the rsp value between the _lf_pinbox_real_free frame, and the caller frame above:

(gdb) up
#12 0x00007f2b707f0152 in lfind (head=<optimized out>, cs=0x7f2b70e9b580, hashnr=3564606781, key=0x7f2b568e05d0 "/data/mysql_ssd/tmp/#sql_322e_8.MYD", keylen=35, cursor=<optimized out>,
pins=0x7ef395846400) at /mnt/workspace/percona-server-5.6-debian-binary-dbg/label_exp/ubuntu-precise-64bit/percona-server-5.6-5.6.16-64.0/mysys/lf_hash.c:138
138 /mnt/workspace/percona-server-5.6-debian-binary-dbg/label_exp/ubuntu-precise-64bit/percona-server-5.6-5.6.16-64.0/mysys/lf_hash.c: No such file or directory.
(gdb) p $rsp
$1 = (void *) 0x7ef50644c690
(gdb) down
#11 0x00007f2b707ef201 in _lf_pinbox_real_free (pins=0x7ef395846400)
at /mnt/workspace/percona-server-5.6-debian-binary-dbg/label_exp/ubuntu-precise-64bit/percona-server-5.6-5.6.16-64.0/mysys/lf_alloc-pin.c:359
359 /mnt/workspace/percona-server-5.6-debian-binary-dbg/label_exp/ubuntu-precise-64bit/percona-server-5.6-5.6.16-64.0/mysys/lf_alloc-pin.c: No such file or directory.
(gdb) p $rsp
$2 = (void *) 0x7ef506412350
(gdb) p 0x7ef50644c690 - 0x7ef506412350
$3 = 238400

At this point, $rsp is dangerously close to the end (beginning?) of the stack. When qsort gets called, it doesn't have enough stack space to complete, which causes the SIGSEGV when the guard page is touched by msort_with_tmp.

There are two problems here.

1) The value in pins->stack_ends_here is actually incorrect. This value is calculated when the thread is initialized in my_thread_init(). The problem is that the value doesn't take into account existing stack usage and just adds the thread stack size to calculate the beginning of the stack.

2) _lf_pinbox_real_free is using alloca in a very unsafe way. alloca should only be used by leaf functions, since you cannot predict the amount of stack space that non-leaf functions will use.

Related branches

Vlad Lesin (vlad-lesin) on 2014-08-01
Changed in percona-server:
assignee: nobody → Vlad Lesin (vlad-lesin)
Vlad Lesin (vlad-lesin) wrote :

The upstream bug report is here: http://bugs.mysql.com/bug.php?id=73979 .

tags: added: upstream

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-807

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.