emacs -nw freezes with 100% CPU with glib 2.31

Bug #902043 reported by Anders Kaseorg on 2011-12-09
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
GNU Emacs
Unknown
Unknown
Fedora
Won't Fix
Undecided
emacs23 (Ubuntu)
Undecided
Unassigned

Bug Description

After glib2.0 2.31.2 was uploaded to precise, ‘emacs -nw’ freezes at startup, going into an infinite loop at 100% CPU usage. A backtrace starts as follows:

#0 xg_select (max_fds=4, rfds=0x7fffd264a5b0, wfds=0x0, efds=0x0,
    timeout=0x7fffd264a800) at xgselect.c:58
#1 0x000000000059cd8e in wait_reading_process_output (time_limit=1,
    microsecs=996270, read_kbd=<optimized out>, do_display=1,
    wait_for_cell=11690546, wait_proc=0x0, just_wait_proc=0) at process.c:4981
#2 0x00000000004f0309 in kbd_buffer_get_event (end_time=0x7fffd264aca0,
    used_mouse_menu=0x0, kbp=<synthetic pointer>) at keyboard.c:4177
#3 read_char (commandflag=0, nmaps=0, maps=0x0, prev_event=11690642,
    used_mouse_menu=0x0, end_time=0x7fffd264aca0) at keyboard.c:3081

which matches a known upstream bug: http://debbugs.gnu.org/9754 . Downgrading to libglib2.0-0 2.30.1-2ubuntu1 makes the problem disappear.

Description of problem:

I run emacs in the server / emacsclient mode. Occasionally all emacs windows will stop responding to character input - though they will will respond to mouse clicks. Sometimes, I can recover by hitting ^G in all windows until one takes it. Other times, the only way out is to kill all emacsclient processes, at which point new emacsclient invocations can connect to the daemon and things work.

Version-Release number of selected component (if applicable):
emacs-23.3-4.fc16.x86_64

How reproducible: Happens daily, but not on command.

Steps to Reproduce:
1. Run emacs with emacsclient
2. Suddenly notice it's ignoring you
3.

Actual results:

No response from emacs windows

Expected results:

The $#*$! thing should be listening to me.

Additional info:

I cannot reproduce the focus problem described in #674140, so I think this is a different bug.

Jonathan, I cannot reproduce it. Tried (fully updated) Rawhide with emacs-23.3-4.fc16.x86_64 and Emacs clients for a day and the issue didn't occur. I also use emacs-23.3-4 on Fedora 14 without any problem (using emacsclients occasionally).

When the problem occurs again, could you attach gdb to the emacs daemon (via `gdb --pid PID`) and get the backtrace, please?

OK, took me a while to reproduce it...and the result is not going to be particularly satisfying. Even after three rounds of installing debuginfo packages, all I get is:

(gdb) bt
#0 0x00007f62e21dc053 in __gethostname (name=0x7fffb718e270 "0\005P", len=140736265248864)
    at ../sysdeps/unix/sysv/gethostname.c:31
#1 0x0000000000b23a12 in ?? ()
#2 0xffffffff10000000 in ?? ()
#3 0x0000000000000000 in ?? ()

OK, here's a hang with more (and totally different) information:

#0 0x00007fe80b6df1f3 in select () at ../sysdeps/unix/syscall-template.S:82
#1 0x00000000004d0af5 in xg_select (max_fds=20, rfds=0x7fff41928f10, wfds=
    0x0, efds=0x0, timeout=0x7fff41929160)
    at /usr/src/debug/emacs-23.3/src/xgselect.c:102
#2 0x000000000059b4ce in wait_reading_process_output (time_limit=30,
    microsecs=0, read_kbd=<optimized out>, do_display=1, wait_for_cell=
    11680274, wait_proc=0x0, just_wait_proc=0)
    at /usr/src/debug/emacs-23.3/src/process.c:4981
#3 0x000000000041e714 in sit_for (timeout=120, reading=1, do_display=1)
    at /usr/src/debug/emacs-23.3/src/dispnew.c:6657
#4 0x00000000004ef9c8 in read_char (commandflag=1, nmaps=2, maps=
    0x7fff41929620, prev_event=11680274, used_mouse_menu=0x7fff41929790,
    end_time=0x0) at /usr/src/debug/emacs-23.3/src/keyboard.c:2972
#5 0x00000000004f091a in read_key_sequence (keybuf=0x7fff419297f0, prompt=
    11680274, dont_downcase_last=0, can_return_switch_frame=1,
    fix_current_buffer=1, bufsize=30)
    at /usr/src/debug/emacs-23.3/src/keyboard.c:9567
#6 0x00000000004f2969 in command_loop_1 ()
    at /usr/src/debug/emacs-23.3/src/keyboard.c:1645
#7 0x00000000005568e4 in internal_condition_case (bfun=
    0x4f2780 <command_loop_1>, handlers=11747538, hfun=0x4e60d0 <cmd_error>)
    at /usr/src/debug/emacs-23.3/src/eval.c:1492
#8 0x00000000004e42fe in command_loop_2 ()
    at /usr/src/debug/emacs-23.3/src/keyboard.c:1362
#9 0x00000000005567ba in internal_catch (tag=Cannot access memory at address 0xfffffffffffffff5
)
    at /usr/src/debug/emacs-23.3/src/eval.c:1228
#10 0x00000000004e6339 in command_loop ()
    at /usr/src/debug/emacs-23.3/src/keyboard.c:1341
#11 0x00000000004e63da in recursive_edit_1 ()
    at /usr/src/debug/emacs-23.3/src/keyboard.c:956
#12 0x00000000004e6517 in Frecursive_edit ()
    at /usr/src/debug/emacs-23.3/src/keyboard.c:1018
#13 0x000000000041357f in main (argc=2, argv=<optimized out>)
    at /usr/src/debug/emacs-23.3/src/emacs.c:1833

Thank you for the backtraces.

As for the backtrace from comment #2, it would help us to know which gethostname() call it is. One gethostbyname() call is used in function
make-network-process in src/process.c, and one in function socket_connection in lib-src/pop.c.

make-network-process is used in Emacs server code in the server-start function for (re)starting the server.

It's also used in ERC, Gnus, Tramp etc. If we'd know which part of Emacs calls it when it hangs, it would be easier find a reliable reproducer.

Which network libraries do you use in Emacs?

It seems Emacs would hang even when not run in server-client mode (?)

The backtrace from comment #3 seems to be the same as for fully functional Emacs server.

I'm not sure what you mean by "which network libraries". Of the modes you listed above, I only use gnus - but I use it heavily. Without gmane, I'd have a hard time keeping up with the world...

I'm seeing this too. Here's a way to reproduce it. I used the gnome shell tweak tool to enable the minimize button on windows. If I minimize emacs with the window manager button or with control+Z, it won't accept keystrokes when I bring it back. If I use the mouse and emacs menu on the restored frame (which still works !?) and open a new frame, the new frame accepts keystrokes while the old one doesn't.

Hi Karel, this bug was fully reproducible on my Win7 Pro laptop, after a clean install of GNU Emacs 23.3.1 (i386-mingw-nt6.1.7600) of 2011-03-10 on 3249CTO. I used the defaults on the .exe setup. The resulting installation was nearly unusable -- took about 15 sec to start, and would hang for approx 15 sec every minute or two. Poking around the 'net I found this page, and also a couple of bloggish places which suggested that adding the following to my .emacs might help.

% try to improve slow performance on windows.
(setq w32-get-true-file-attributes nil)

Well... curiously... I couldn't find my .emacs. I did find an .emacs.d/ in an inappropriate place -- my Roaming profile. I couldn't find a HOME var so I put one in a user variable... I wonder, should it be a System var? Anyway I made a user var called HOME, pointed it at LocalLow, and got a usable emacs. One strangeness -- I now have a second .emacs.d/ inside of my .emacs.d/. This new .emacs.d/ eems to be created automagically whenever I launch emacs, so I'm leaving it there for now, but I think there's some residual strangeness either in the codebase or in my setup. The good news is that emacs usually launches within a couple of seconds, except when I'm launching it by double-clicking on my .emacs file -- I encounter the 15-second delay on that. I haven't used emacs heavily yet so don't know if I'll still occasionally get the non-responsive kb bug, but I'm moderately confident that this was caused by an attempt by emacs to create a new file in a roaming profile dir -- this is quite a slow process at best, and didn't even seem to succeed -- I never was able to create a .emacs in my Roaming area.

I have a couple of screenshots which might be helpful (of my homedir and of the HOME var edit miniscreen) but I couldn't figure out how to upload them through the bugzilla interface and have to run now.

Created attachment 530630
screenshots of my setup... possibly useful

Created attachment 530631
screenshots of my setup... possibly useful

for what its worth, I also see this when accidentally minimizing emacs in gnome-shell on f16. I have the same backtrace as Jon's in comment 3:

Breakpoint 1 at 0x3e29ce8263: file ../sysdeps/unix/syscall-template.S, line 82.
(gdb) info threads
  Id Target Id Frame
* 1 Thread 0x7f65ccc7a980 (LWP 7874) "emacs" 0x0000003e29ce8263 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82
(gdb) bt
#0 0x0000003e29ce8263 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1 0x00000000004d09a5 in xg_select (max_fds=10, rfds=0x7fff9c2be390, wfds=0x0, efds=0x0, timeout=0x7fff9c2be5e0) at /usr/src/debug/emacs-23.3/src/xgselect.c:102
#2 0x000000000059b19e in wait_reading_process_output (time_limit=0, microsecs=0, read_kbd=<optimized out>, do_display=1, wait_for_cell=11676178, wait_proc=0x0,
    just_wait_proc=0) at /usr/src/debug/emacs-23.3/src/process.c:4981
#3 0x00000000004ee6c0 in kbd_buffer_get_event (end_time=0x0, used_mouse_menu=0x7fff9c2bebd0, kbp=<synthetic pointer>) at /usr/src/debug/emacs-23.3/src/keyboard.c:4183
#4 read_char (commandflag=1, nmaps=2, maps=0x7fff9c2bea60, prev_event=11676178, used_mouse_menu=0x7fff9c2bebd0, end_time=0x0)
    at /usr/src/debug/emacs-23.3/src/keyboard.c:3081
#5 0x00000000004f076a in read_key_sequence (keybuf=0x7fff9c2bec30, prompt=11676178, dont_downcase_last=0, can_return_switch_frame=1, fix_current_buffer=1, bufsize=30)
    at /usr/src/debug/emacs-23.3/src/keyboard.c:9567
#6 0x00000000004f27b9 in command_loop_1 () at /usr/src/debug/emacs-23.3/src/keyboard.c:1645
#7 0x0000000000556634 in internal_condition_case (bfun=0x4f25d0 <command_loop_1>, handlers=11743442, hfun=0x4e5f30 <cmd_error>) at /usr/src/debug/emacs-23.3/src/eval.c:1492
#8 0x00000000004e415e in command_loop_2 () at /usr/src/debug/emacs-23.3/src/keyboard.c:1362
#9 0x000000000055650a in internal_catch (tag=11736258, func=0x4e4140 <command_loop_2>, arg=11676178) at /usr/src/debug/emacs-23.3/src/eval.c:1228
#10 0x00000000004e6199 in command_loop () at /usr/src/debug/emacs-23.3/src/keyboard.c:1341
#11 0x00000000004e623a in recursive_edit_1 () at /usr/src/debug/emacs-23.3/src/keyboard.c:956
#12 0x00000000004e6377 in Frecursive_edit () at /usr/src/debug/emacs-23.3/src/keyboard.c:1018
#13 0x000000000041359f in main (argc=1, argv=<optimized out>) at /usr/src/debug/emacs-23.3/src/emacs.c:1833

additional information:

I run (server-start) from .emacs. This is pretty reproducible, happens most times I minimize emacs, even if it is freshly opened and only has the scratch buffer up. One of the times it happens, I believe I saw something like 'xf86wakeup undefined' in the minibuffer (not sure if thats related). If it hit the close button on the window frame, emacs will close cleanly (and even pop up a dialog asking if I want to save my buffers).

Let me know if there is more I can do to diagnose.

Anders Kaseorg (andersk) wrote :

Here’s a debdiff with the patch from Fedora. I’m testing it in my PPA; I’ll post the results once it builds.

The attachment "emacs23_23.3+1-1ubuntu6_lp902043.debdiff" of this bug report has been identified as being a patch in the form of a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-sponsors team please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in emacs23 (Ubuntu):
status: New → Confirmed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package emacs23 - 23.3+1-1ubuntu7

---------------
emacs23 (23.3+1-1ubuntu7) precise; urgency=low

  * debian/patches/emacs-xgselect.patch: Initialize xgselect in
    function xg_select when gfds_size == 0. Fixes 100% CPU startup
    freeze with glib 2.31. (LP: #902043)
 -- Anders Kaseorg <email address hidden> Fri, 09 Dec 2011 02:17:48 -0500

Changed in emacs23 (Ubuntu):
status: Confirmed → Fix Released
Bryce Harrington (bryce) wrote :

Hi Anders, thanks for fixing this bug. Seems the PPAs are backed up, your package still hasn't built. However, I verified it builds on i386 in pbuilder.

I upgraded to glib 2.31 but was unable to reproduce the bug. However looking at the patch the error looks pretty obvious and the patch looks sane, so I've gone ahead and uploaded it.

Anders Kaseorg (andersk) wrote :

Yeah, that fixed it. Thanks.

This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.

This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

I see this pretty regularly too.

Hi Tim,

I will investigate the issues with upstream.
I added to .emacs file (server-start).
I start emacsclient to reproduce the problem but without any success.

It seems that I missed something.
I will be in contact with upstream to solved the problem.

I use Fedora 20 with emacs 24.3 version.

I did it via the menu, so my .emacs file now ends with "(server-mode)".

I start emacsclient as "emacsclient --create-frame -q ...".

It can take quite a few closed windows and re-opened windows before the problem appears. I'm not entirely sure what the trigger is.

emacs-24.3-13.fc20.x86_64

This message is a notice that Fedora 19 is now at end of life. Fedora
has stopped maintaining and issuing updates for Fedora 19. It is
Fedora's policy to close all bug reports from releases that are no
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 19 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

I'm seeing this bug in Fedora 21 with Emacs 24.4
[9879:0]>rpm -q emacs
emacs-24.4-3.fc21.x86_64

My setup is similar to the one described here - I run (server-start) as part of initialization.

thaynes@nexus6:~
[9883:0]>gstack 29784
Thread 4 (Thread 0x7f1d85f9f700 (LWP 29827)):
#0 0x00000031452f51fd in poll () at /lib64/libc.so.6
#1 0x0000003147a49e24 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2 0x0000003147a49f3c in g_main_context_iteration () at /lib64/libglib-2.0.so.0
#3 0x0000003147a49f79 in glib_worker_main () at /lib64/libglib-2.0.so.0
#4 0x0000003147a707b5 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#5 0x0000003145e0752a in start_thread () at /lib64/libpthread.so.0
#6 0x000000314530079d in clone () at /lib64/libc.so.6
Thread 3 (Thread 0x7f1d84a54700 (LWP 29829)):
#0 0x00000031452f51fd in poll () at /lib64/libc.so.6
#1 0x0000003147a49e24 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2 0x0000003147a4a1b2 in g_main_loop_run () at /lib64/libglib-2.0.so.0
#3 0x000000314aadad96 in gdbus_shared_thread_func () at /lib64/libgio-2.0.so.0
#4 0x0000003147a707b5 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#5 0x0000003145e0752a in start_thread () at /lib64/libpthread.so.0
#6 0x000000314530079d in clone () at /lib64/libc.so.6
Thread 2 (Thread 0x7f1d7fb13700 (LWP 29832)):
#0 0x00000031452f51fd in poll () at /lib64/libc.so.6
#1 0x0000003147a49e24 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0
#2 0x0000003147a49f3c in g_main_context_iteration () at /lib64/libglib-2.0.so.0
#3 0x00007f1d7fb4924d in dconf_gdbus_worker_thread () at /usr/lib64/gio/modules/libdconfsettings.so
#4 0x0000003147a707b5 in g_thread_proxy () at /lib64/libglib-2.0.so.0
#5 0x0000003145e0752a in start_thread () at /lib64/libpthread.so.0
#6 0x000000314530079d in clone () at /lib64/libc.so.6
Thread 1 (Thread 0x7f1d8c720a80 (LWP 29784)):
#0 0x00000031452f712c in pselect () at /lib64/libc.so.6
#1 0x00000000005d392b in xg_select ()
#2 0x0000000000598b26 in wait_reading_process_output ()
#3 0x00000000004f09a1 in read_decoded_event_from_main_queue ()
#4 0x00000000004f4252 in read_char ()
#5 0x00000000004f538f in read_key_sequence.constprop ()
#6 0x00000000004f7100 in command_loop_1 ()
#7 0x0000000000559b17 in internal_condition_case ()
#8 0x00000000004e96de in command_loop_2 ()
#9 0x00000000005599fb in internal_catch ()
#10 0x00000000004edcd7 in recursive_edit_1 ()
#11 0x00000000004edff0 in Frecursive_edit ()
#12 0x0000000000418019 in main ()

Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Changed in fedora:
importance: Unknown → Undecided
status: Unknown → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.