Ubuntu

vim: hang after ^Z / fg

Reported by River Tarnell on 2008-10-31
50
This bug affects 8 people
Affects Status Importance Assigned to Milestone
vim (Debian)
Fix Released
Unknown
vim (Ubuntu)
High
Unassigned

Bug Description

Binary package hint: vim-gtk

Using vim-gtk 1:7.1.314-3ubuntu3.

vim sometimes (often) fails to resume properly after ^Z. Instead of redrawing the screen, it produces no output. To get my vim back I have to type ^C, after which it starts working normally again.

To reproduce:

1. Open a terminal
2. run 'vim'
3. quickly type 'ifoo ESC ^Z'
4. wait a few seconds, then type 'fg'
5. nothing happens.

Originally I thought this was an interaction with zsh, but it also happens with the 'bash' shell.

This bug is present in Debian as well. In fact, I encounter it nearly every time I use vim on a Debian or Ubuntu system...

I can't reproduce this bug. Could it be caused by some settings in your ~/.vimrc or some plugin that you installed?

Can you try first to run vim with: "vim -u NONE" to avoid loading ~/.vimrc and plugins. If it then works, you can try commenting things in your ~/.vimrc or move out your ~/.vim directory until you find what setting or plugin is triggering the bug.

River Tarnell (river-wikimedia) wrote :

'vim -u NONE' seems to fix the problem (I tried several times but couldn't reproduce it). However, it's not caused by my .vimrc; I renamed the file and the problem still appears when not using '-u NONE'. I don't have a .vim directory.

Tessa Lau (tlau) wrote :

I've seen this problem since upgrading from hardy to intrepid. There doesn't seem to be any specific trigger. If I repeat the following actions several times, I can trigger the bug after a few iterations:

1. open a gnome-terminal
2. run 'vim'
3. type ^Z to background it
4. run 'ls'
5. type 'fg' to bring vim back

The terminal outputs:
[1] + continued vim myfile.txt

but the vim screen does not refresh itself. I noticed that I can type ^C when it's wedged like that and the display will come back.

I am using the same vim configuration I had in hardy, and didn't see the problem at all until the upgrade. Another note is that I usually run vim inside of screen, but I have seen the problem outside of screen as well.

Colin Watson (cjwatson) wrote :

I've seen this sporadically. I don't know why yet.

Changed in vim:
status: New → Confirmed
importance: Undecided → High
David Bushong (dbushong) wrote :

Ditto. It happens almost every time for me; I've just gotten in the habit of hitting ^C every time i fg. Somewhat silly.

Miek Gieben (miek) wrote :

Same problem here. With zsh and

Package: vim-full
Status: install ok installed
Priority: extra
Section: editors
Installed-Size: 120
Maintainer: Ubuntu Core Developers <email address hidden>
Architecture: all
Source: vim
Version: 1:7.1.314-3ubuntu3.1

Miek Gieben (miek) wrote :

Strace of when vim hangs:

After the SIGINT (control-C) is of course resumes.

foton.m% strace -p 8439
Process 8439 attached - interrupt to quit
pause() = ? ERESTARTNOHAND (To be restarted)
--- SIGINT (Interrupt) @ 0 (0) ---
rt_sigaction(SIGINT, {0x815a1d0, [], 0}, {0x815a1d0, [], 0}, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [INT], [INT], 8) = 0
sigreturn() = ? (mask now [])
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
select(8, [0 5 7], NULL, [0 5], {0, 0}) = 0 (Timeout)
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
select(8, [0 5 7], NULL, [0 5], {0, 0}) = 0 (Timeout)
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) =

David Hoeffer (d-hoeffer) wrote :

I'm getting this as well, sporadically, and only with vim.gnome, not with vim.basic. I can reproduce it consistently by repeatedly putting vim to sleep with Ctrl-Z and then waking it up. At some point it will hang.

Here's a backtrace from my system. It seems to be the same as
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=343208, which seems to have been prematurely closed. I suspect this bug isn't in vim itself.

#0 0xb80a1430 in __kernel_vsyscall ()
#1 0xb7462ec1 in pause () from /lib/tls/i686/cmov/libc.so.6
#2 0x0815cc0a in mch_suspend ()
#3 0x080c09ae in ?? ()
#4 0x080ca9df in ?? ()
#5 0x080c8cdc in do_cmdline ()
#6 0x080ccc39 in do_cmdline_cmd ()
#7 0x08141c43 in normal_cmd ()
#8 0x08101c37 in main_loop ()
#9 0x08104dc9 in main ()

Version: 1:7.1.314-3ubuntu3.1

David Hoeffer (d-hoeffer) wrote :

I had a second look at this. When the bug happens, the backtrace looks like this:

#0 0xb804d430 in __kernel_vsyscall ()
#1 0xb740cec1 in pause () from /lib/tls/i686/cmov/libc.so.6
#2 0x0815cc0a in mch_suspend () at os_unix.c:1117
#3 0x080c09ae in ex_stop (eap=0xbf84c628) at ex_docmd.c:6578
#4 0x080ca9df in do_one_cmd (cmdlinep=0xbf84c7a0, sourcing=1, cstack=0xbf84c7a4, fgetline=0, cookie=0x0)
    at ex_docmd.c:2623
#5 0x080c8cdc in do_cmdline (cmdline=0x821b128 "st", getline=0, cookie=0x0, flags=11) at ex_docmd.c:1099
#6 0x080ccc39 in do_cmdline_cmd (cmd=0x821b128 "st") at ex_docmd.c:705
#7 0x08141c43 in normal_cmd (oap=0xbf84cb84, toplevel=1) at normal.c:1152
#8 0x08101c37 in main_loop (cmdwin=0, noexmode=0) at main.c:1195
#9 0x08104dc9 in main (argc=Cannot access memory at address 0x0
) at main.c:954

On the other hand, when it doesn't happen, it looks like this:

#0 0xb8089430 in __kernel_vsyscall ()
#1 0xb73d7bb6 in kill () from /lib/tls/i686/cmov/libc.so.6
#2 0x0815cbfc in mch_suspend () at os_unix.c:1112
#3 0x080c09ae in ex_stop (eap=0x0) at ex_docmd.c:6578
#4 0x080ca9df in do_one_cmd (cmdlinep=0xbfb89fa0, sourcing=1, cstack=0xbfb89fa4, fgetline=0, cookie=0x0)
    at ex_docmd.c:2623
#5 0x080c8cdc in do_cmdline (cmdline=0x821b128 "st", getline=0, cookie=0x0, flags=11) at ex_docmd.c:1099
#6 0x080ccc39 in do_cmdline_cmd (cmd=0x821b128 "st") at ex_docmd.c:705
#7 0x08141c43 in normal_cmd (oap=0xbfb8a384, toplevel=1) at normal.c:1152
#8 0x08101c37 in main_loop (cmdwin=0, noexmode=0) at main.c:1195
#9 0x08104dc9 in main (argc=Cannot access memory at address 0x14
) at main.c:954

The difference is that in one case we call pause() from mch_suspend.
The relevant code looks like this:

# ifdef _REENTRANT
    sigcont_received = FALSE;
# endif
    kill(0, SIGTSTP); /* send ourselves a STOP signal */
# ifdef _REENTRANT
    /* When we didn't suspend immediately in the kill(), do it now. Happens
     * on multi-threaded Solaris. */
    if (!sigcont_received)
 pause();
# endif

It beats me how we can get to pause(), but it happens and that's why you
need to hit Ctrl-C to resume. It seems to be a race condition, that's why
you don't get it every time. This code is unchanged in vim 7.2.108 in git,
so I'd expect this bug to happen there as well.

Thanks for the stack traces. That looks very useful.

I've been able to reproduce this bug only twice so far. It's very rare on
my machine but it did happen.

From your stack trace, I suppose that we can't assume that kill(0, SIGTSTP);
is handled immediately (i.e. it's asynchronous) even though signal is sent to
ouselves. So if the signal is handled after if (!sigcont_received) but before pause();
then we call pause() and we then wait for a signal that will never come anymore
(which would hangs vim until we press CTRL-C.

If so, we should not call pause() here. Yielding the CPU (sleep(0)) should
be enough to have the signal handled when process is rescheduled since
we sent it to ourselves.

Can anybody confirm that the following patch fixes to bug? It works for me
but since bug was very hard to reproduce on my machine it would be
interesting to get feedback for other people, especially on Solaris since old
code had a comment for Solaris.

Index: os_unix.c
===================================================================
RCS file: /cvsroot/vim/vim7/src/os_unix.c,v
retrieving revision 1.90
diff -c -r1.90 os_unix.c
*** os_unix.c 22 Feb 2009 01:52:46 -0000 1.90
--- os_unix.c 22 Feb 2009 14:05:50 -0000
***************
*** 1122,1133 ****
      sigcont_received = FALSE;
  # endif
      kill(0, SIGTSTP); /* send ourselves a STOP signal */
! # ifdef _REENTRANT
! /* When we didn't suspend immediately in the kill(), do it now. Happens
! * on multi-threaded Solaris. */
! if (!sigcont_received)
! pause();
! # endif

  # ifdef FEAT_TITLE
      /*
--- 1122,1144 ----
      sigcont_received = FALSE;
  # endif
      kill(0, SIGTSTP); /* send ourselves a STOP signal */
! /*
! * Wait for the STOP signal to be handled. It generally happens
! * immediately since signal is sent to ourselves, but somehow not
! * all the time. Do not call pause() because there would be race
! * condition which would hang Vim if signal happened in between the
! * test of sigcont_received and the call to pause(). If signal is
! * not yet received, call sleep(0) to just yield CPU. Signal should
! * then be received. If still not received, sleep 1, 2, 3 ms.
! * Don't bother waiting further if signal is not received after
! * 1+2+3+4 ms.
! */
! {
! long wait;
! for (wait = 0; !sigcont_received && wait <= 3L; wait++)
! /* Loop is not entered most of the time */
! mch_delay(wait, FALSE);
! }

  # ifdef FEAT_TITLE
      /*

Put previous patch as attachment (since putting it in previous comment removed spaces)

FWIW: I've never encountered this bug on Solaris, so presumably whatever the old code was meant to fix works fine there. I'll look at compiling a vim with this patch, and see if it fixes the problem on Linux, and/or breaks it again on Solaris.

David Hoeffer (d-hoeffer) wrote :

Thanks dominiko! I've rebuilt the package with your patch (wrapping it in an #ifdef _REENTRANT), and it fixes the problem - I didn't see the problem anymore in about 30 Ctrl-z/fg sequences, while I'd get it after max 10 without the patch.

It's probably also a good idea to make the variable "sigcont_received" volatile since it is changed in a signal handler.
I will send the patch to the vim_dev mailing list.

[ Quoting David Hoeffer in "[Bug 291373] Re: vim: hang after ^Z"... ]
> Thanks dominiko! I've rebuilt the package with your patch (wrapping it

being /very/ lazy here... :) But do you have a deb source package
somewhere for download?

Regards,
Miek

Miek Gieben (miek) wrote :

[ Quoting dominiko in "[Bug 291373] Re: vim: hang after ^Z"... ]
> This is the message in vim_dev where I posted the patch:
>
> ** Attachment added: "fixed race condition in suspend resume with CTRL-Z fg"
> http://launchpadlibrarian.net/22983879/fix-race-condition-suspend-os_unix.c.patch

ok ok :)

I've applied the patch and rebuild my vim packages. I'm testing it now.
I'm put up my vim packages (for now) at:

http://miek.nl/vim-temp/

I will leave them there for a week or so.

Regards,

--
 --Miek

This bug has now been fixed upstream in Vim-7.2.130.
You can find the patch here:

ftp://ftp.vim.org/pub/vim/patches/7.2/7.2.130

See also the summary of all pages available for Vim-7.2 available at:

ftp://ftp.vim.org/pub/vim/patches/7.2/README

No idea when this version will make it into Ubuntu packages.

Changed in vim:
status: Unknown → Fix Released
ubunturox (ubuntu-rox) wrote :

I also have this problem but only if I install vim-gui-common or it gets installed by some other package.
Looks like some new vimrc gets installed somewhere that causes that.

Fixes that I found:
alias vi:
vi="vi -u ~/.vimrc"

or

set the DISPLAY variable to nothing:
export DISPLAY=

Just wanted to mention this in case someone has the same problems and the aforementioned patch does not work.

Bug was fixed in Vim-7.2.130.

I have just installed Ubuntu-9.10 Karmic Koala, and I see that it comes with Vim-7.2.245.
So bug can be marked as fixed at least in Ubuntu >= 9.10 (Karmic Koala)

Changed in vim (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.