Comment 9 for bug 291373

Revision history for this message
David Hoeffer (d-hoeffer) wrote :

I had a second look at this. When the bug happens, the backtrace looks like this:

#0 0xb804d430 in __kernel_vsyscall ()
#1 0xb740cec1 in pause () from /lib/tls/i686/cmov/libc.so.6
#2 0x0815cc0a in mch_suspend () at os_unix.c:1117
#3 0x080c09ae in ex_stop (eap=0xbf84c628) at ex_docmd.c:6578
#4 0x080ca9df in do_one_cmd (cmdlinep=0xbf84c7a0, sourcing=1, cstack=0xbf84c7a4, fgetline=0, cookie=0x0)
    at ex_docmd.c:2623
#5 0x080c8cdc in do_cmdline (cmdline=0x821b128 "st", getline=0, cookie=0x0, flags=11) at ex_docmd.c:1099
#6 0x080ccc39 in do_cmdline_cmd (cmd=0x821b128 "st") at ex_docmd.c:705
#7 0x08141c43 in normal_cmd (oap=0xbf84cb84, toplevel=1) at normal.c:1152
#8 0x08101c37 in main_loop (cmdwin=0, noexmode=0) at main.c:1195
#9 0x08104dc9 in main (argc=Cannot access memory at address 0x0
) at main.c:954

On the other hand, when it doesn't happen, it looks like this:

#0 0xb8089430 in __kernel_vsyscall ()
#1 0xb73d7bb6 in kill () from /lib/tls/i686/cmov/libc.so.6
#2 0x0815cbfc in mch_suspend () at os_unix.c:1112
#3 0x080c09ae in ex_stop (eap=0x0) at ex_docmd.c:6578
#4 0x080ca9df in do_one_cmd (cmdlinep=0xbfb89fa0, sourcing=1, cstack=0xbfb89fa4, fgetline=0, cookie=0x0)
    at ex_docmd.c:2623
#5 0x080c8cdc in do_cmdline (cmdline=0x821b128 "st", getline=0, cookie=0x0, flags=11) at ex_docmd.c:1099
#6 0x080ccc39 in do_cmdline_cmd (cmd=0x821b128 "st") at ex_docmd.c:705
#7 0x08141c43 in normal_cmd (oap=0xbfb8a384, toplevel=1) at normal.c:1152
#8 0x08101c37 in main_loop (cmdwin=0, noexmode=0) at main.c:1195
#9 0x08104dc9 in main (argc=Cannot access memory at address 0x14
) at main.c:954

The difference is that in one case we call pause() from mch_suspend.
The relevant code looks like this:

# ifdef _REENTRANT
    sigcont_received = FALSE;
# endif
    kill(0, SIGTSTP); /* send ourselves a STOP signal */
# ifdef _REENTRANT
    /* When we didn't suspend immediately in the kill(), do it now. Happens
     * on multi-threaded Solaris. */
    if (!sigcont_received)
 pause();
# endif

It beats me how we can get to pause(), but it happens and that's why you
need to hit Ctrl-C to resume. It seems to be a race condition, that's why
you don't get it every time. This code is unchanged in vim 7.2.108 in git,
so I'd expect this bug to happen there as well.