gnome-shell 100% CPU: Infinite loop in lid_switch_keyboard_event() from post_device_event()

Bug #1724259 reported by Michael Thayer on 2017-10-17
42
This bug affects 3 people
Affects Status Importance Assigned to Milestone
libinput
Fix Released
High
libinput (Ubuntu)
High
Unassigned
Artful
High
Unassigned

Bug Description

[Impact]

I have been having regular system freezes after switching from using my laptop stand-alone to waking it up on a docking station with an external monitor. I have seen this using two identical docking station plus monitor combinations. I am still trying to find a pattern. I was able to ssh in and use apport to save this bug information. The gnome-shell process was hung at 100% CPU utilisation.

libinput 1.8.4 fixes this

[Test case]

Wake up while the system is docked and attached to an external monitor.

[Regression potential]

The bugfix comes via a new microrelease update, so chance of regressions should be minimal.

--

ProblemType: Bug
DistroRelease: Ubuntu 17.10
Package: gnome-shell 3.26.1-0ubuntu3
ProcVersionSignature: Ubuntu 4.13.0-16.19-generic 4.13.4
Uname: Linux 4.13.0-16-generic x86_64
ApportVersion: 2.20.7-0ubuntu3
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Tue Oct 17 15:56:34 2017
DisplayManager: gdm3
ExecutablePath: /usr/bin/gnome-shell
GsettingsChanges:
 b'org.gnome.shell' b'command-history' b"['gnome-terminal']"
 b'org.gnome.shell' b'had-bluetooth-devices-setup' b'true'
 b'org.gnome.shell' b'favorite-apps' b"['ubiquity.desktop', 'org.gnome.Nautilus.desktop', 'firefox.desktop', 'libreoffice-writer.desktop', 'libreoffice-calc.desktop', 'libreoffice-impress.desktop', 'org.gnome.Software.desktop', 'gnome-control-center.desktop']"
 b'org.gnome.desktop.interface' b'gtk-im-module' b"'gtk-im-context-simple'"
InstallationDate: Installed on 2016-05-31 (504 days ago)
InstallationMedia: Ubuntu 15.10 "Wily Werewolf" - Release amd64 (20151021)
SourcePackage: gnome-shell
UpgradeStatus: Upgraded to artful on 2017-10-05 (12 days ago)

Michael Thayer (michael-thayer) wrote :
Michael Thayer (michael-thayer) wrote :

Looking at the journald error log, the problem happened after Okt 17 15:27:08. I hoped that apport would take a stack trace of the gnome-shell process, but I will probably have to do that myself when I get the chance. Can you retrace a stack trace if the debug packages are not installed?

Yes, you can get a stack trace without debug packages. It's just going to be much less detailed, and risks not showing enough information. But we would welcome a stack trace (or multiple traces to ensure you're looking in the right spot), even without debug info.

summary: - System freeze after docking and display configuration change
+ gnome-shell frozen and using 100% CPU after docking and display
+ configuration change
Changed in gnome-shell (Ubuntu):
status: New → Incomplete
Michael Thayer (michael-thayer) wrote :

Here is a gdb stacktrace. I tried sending signal 3 to the process in the hope that it would trigger an apport bug report, but it did not.

Daniel van Vugt (vanvugt) wrote :

Thanks. It's not obvious to me still which is the busy thread. Can you find out by running 'top' and then pressing 'H' to show threads?

Michael Thayer (michael-thayer) wrote :

I will do that next time I see this. A question: is there a signal I can send to the process (SEGV?) to trigger an Apport bug report in /var/crash? And would that make things easier for you?

Daniel van Vugt (vanvugt) wrote :

Yes, certainly "kill -SEGV ..." will work. But that may confuse other readers into thinking an actual SEGV happened. Other less dramatic signals like ABRT, URS1, USR2 may work but it depends on the program and whether the authors are handling those internally (which means a core may not get generated).

Daniel van Vugt (vanvugt) wrote :

Maybe try "kill -TRAP" or kill "-5" on gnome-shell.

Michael Thayer (michael-thayer) wrote :

Thank you. I assume then that my failure to get a crash report was due to the GNOME Shell process and not to the underlying mechanism. I will just check the GNOME Shell/Mutter/whatever source next time beforehand.

I checked and it was the main thread which was using CPU. Uploading a crash report (Apport would not submit it due to a couple of outdated packages). Note that this is on a different system with a fresh Ubuntu 17.10 install (not upgraded; but the report should tell you all that).

Sebastien Bacher (seb128) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. The issue you are reporting is an upstream one and it would be nice if somebody having it could send the bug to the developers of the software by following the instructions at https://wiki.ubuntu.com/Bugs/Upstream/GNOME. If you have done so, please tell us the number of the upstream bug (or the link), so we can add a bugwatch that will inform us about its status. Thanks in advance.

Daniel van Vugt (vanvugt) wrote :

Apport uploads for 17.10 got disabled last night. Standard policy after release :/

You can however get around it (please) by downgrading your apport to the previous release. Then use 'ubuntu-bug' to push us the crash. Then mention the new bug number here.

I'm sure there must be a proper way to do this...

That won't fix the outdated packages (one of them was curl...) though, will it? And does disabled mean that Launchpad will refuse to accept the upload even if my system tries to send it?

I installed debug symbols for libinput and had a look at the core file, and it looks very much like this:

https://bugs.freedesktop.org/show_bug.cgi?id=103298

I am missing the top frame in that stack trace, but I am guessing that is just chance based on when I took the trace. The description of how to trigger it looks like it could apply too. In any case, here is my trace:

(gdb) bt
#0 0x00007f6b37a2ccc7 in post_device_event (device=device@entry=0x56373d197e80, time=time@entry=1377003141, type=type@entry=
    LIBINPUT_EVENT_KEYBOARD_KEY, event=0x56373e972770) at libinput.c:2312
#1 0x00007f6b37a2df5f in keyboard_notify_key (device=device@entry=0x56373d197e80, time=time@entry=1377003141, key=<optimized out>, state=<optimized out>)
    at libinput.c:2412
#2 0x00007f6b37a30ab7 in fallback_keyboard_notify_key (device=device@entry=0x56373d197e80, time=time@entry=1377003141, key=<optimized out>, state=<optimized out>, dispatch=<optimized out>) at evdev.c:173
#3 0x00007f6b37a339b3 in fallback_process_key (e=0x7ffe3b444aa0, e=0x7ffe3b444aa0, time=1377003141, device=0x56373d197e80, dispatch=<optimized out>)
    at evdev.c:969
#4 0x00007f6b37a339b3 in fallback_process (evdev_dispatch=<optimized out>, device=0x56373d197e80, event=0x7ffe3b444aa0, time=1377003141) at evdev.c:1301
#5 0x00007f6b37a31159 in evdev_process_event (e=0x7ffe3b444aa0, device=0x56373d197e80) at evdev.c:2052
#6 0x00007f6b37a31159 in evdev_device_dispatch_one (ev=0x7ffe3b444aa0, device=0x56373d197e80) at evdev.c:2060
#7 0x00007f6b37a31159 in evdev_device_dispatch (data=0x56373d197e80)
    at evdev.c:2119
#8 0x00007f6b37a2cdaf in libinput_dispatch (libinput=0x56373d123d60)
    at libinput.c:2196
#9 0x00007f6b4113be9c in ()
---Type <return> to continue, or q <return> to quit---
    at /usr/lib/x86_64-linux-gnu/mutter/libmutter-clutter-1.so
#10 0x00007f6b424c2fb7 in g_main_context_dispatch ()
    at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#11 0x00007f6b424c31f0 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#12 0x00007f6b424c3502 in g_main_loop_run ()
    at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#13 0x00007f6b409f868c in meta_run ()
    at /usr/lib/x86_64-linux-gnu/libmutter-1.so.0
#14 0x000056373c9ec2e7 in ()
#15 0x00007f6b4038a1c1 in __libc_start_main (main=
    0x56373c9ebef0, argc=1, argv=0x7ffe3b444f78, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe3b444f68)
    at ../csu/libc-start.c:308
#16 0x000056373c9ec3fa in ()

I will not fiddle around trying to properly upload the apport report just now, but I will keep it around locally.

Daniel van Vugt (vanvugt) wrote :

You will need to update your system (other than apport) still.

And yes I believe (based on observations today) Launchpad will accept it. The block is only on the user end.

I have rebuilt libinput10 with the patch in that freedesktop.org bug report and will see if that helps things before trying to submit anything to launchpad.

From the other bug (this is getting a bit tiring):

I was able to trigger the crash pretty easily with the old library still in use by opening and closing the lid a few times in quick succession and then trying to type an update to this bug. After restarting I was not immediately able to trigger it, but instead brought gnome-shell down with an assertion (g_malloc failed to allocate 1.8x10^19 bytes). Working down the stack a few frames it looked like it was trying to use an invalid structure, presumably in memory which had already been freed. Which at least raises the possibility that gnome-shell has some memory use bug which triggers on suspend and resume and that libinput was collateral damage.

I am thinking of rebuilding gnome-shell with memory debugging enabled.

Not succeeded in reproducing this again after rebuilding GNOME Shell with ASAN_OPTIONS=detect_leaks=0 CFLAGS="-fsanitize=address -fsanitize=return -fsanitize=bounds -fsanitize=object-size". I do now manage to trigger debian bug 823216 quite reliably now, with the difference that I can log on again immediately. It is usually the user GNOME Shell and XWayland which crash, but at least once the gdm ones crashed.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=823216

I trigger this by opening a few applications (Firefox with a few windows seems to make it easier to reproduce) and repeatedly opening and closing the laptop lid. It seems that, as per systemd logind settings this should not trigger a suspend when docked, but occasionally it does. The xwayland termination happens just before this (as observed from an ssh session on my other laptop: I can see the results just before the session stops responding due to the suspension).

Thinking of it I might be confusing cause and effect here: the unexpected suspend might be caused somehow by the xwayland and gnome-shell crash.

Or a third hypothesis: systemd-logind triggers a suspend on lid close if there is no external monitor plugged in. Perhaps it looks for a while too long like there is no monitor, so that systemd-logind suspends the system and the GNOME Shell wayland output object disappears, causing the xwayland error.

Changed in libinput (Ubuntu):
status: New → Confirmed
Changed in libinput:
importance: Unknown → High
status: Unknown → In Progress
Changed in libinput (Ubuntu):
importance: Undecided → High
status: Confirmed → Triaged
Changed in libinput:
status: In Progress → Fix Released
Timo Aaltonen (tjaalton) wrote :

libinput 1.9.2 synced to bionic, artful will get 1.8.4 via bug #1733573

Changed in libinput (Ubuntu):
status: Triaged → Fix Released
description: updated

Hello Michael, or anyone else affected,

Accepted libinput into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libinput/1.8.4-0ubuntu0.17.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in libinput (Ubuntu Artful):
status: New → Fix Committed
tags: added: verification-needed verification-needed-artful

I must admit that I have not seen this freeze for a while (unless I just missed it among all the swap-to-death freezes I have had recently), but I installed the packages. I did not enable proposed, I just manually installed libinput-bin and libinput10, versions 1.8.4. I probably can't give any meaningful feedback until after my next reboot though.

Timo Aaltonen (tjaalton) wrote :

does the update still work?

I am currently running version 1.8.4-0ubuntu0.17.10.1 of both packages and have not seen this problem for a while. I have rebooted several times since installing the packages.

Off-topic, but since yesterday (can't remember it happening earlier) I have started seeing mouse button freezes after suspend and resume, affecting clients. Most windows can still be dragged, and the dock responds to mouse buttons, but e.g. the terminal, Firefox and Thunderbird respond to keyboard but not mouse. I can raise the windows using the dock. Would you happen to know of any matching open bug for that, or anything else?

summary: - gnome-shell frozen and using 100% CPU after docking and display
- configuration change
+ Infinite loop in lid_switch_keyboard_event() from post_device_event()

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in gnome-shell (Ubuntu Artful):
status: New → Confirmed
no longer affects: gnome-shell (Ubuntu)
no longer affects: gnome-shell (Ubuntu Artful)
summary: - Infinite loop in lid_switch_keyboard_event() from post_device_event()
+ gnome-shell 100% CPU: Infinite loop in lid_switch_keyboard_event() from
+ post_device_event()
tags: added: verification-done verification-done-artful
removed: verification-needed verification-needed-artful
Changed in libinput (Ubuntu Artful):
importance: Undecided → High
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libinput - 1.8.4-0ubuntu0.17.10.1

---------------
libinput (1.8.4-0ubuntu0.17.10.1) artful; urgency=medium

  * New bugfix release. (LP: #1724259, #1733573)

 -- Timo Aaltonen <email address hidden> Tue, 21 Nov 2017 13:52:19 +0200

Changed in libinput (Ubuntu Artful):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for libinput has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.