suspends after a few minutes on machines with a bogus (closed) lid switch

Bug #1438301 reported by Colin Ian King
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Won't Fix
Low
Unassigned
Vivid
Won't Fix
Low
Unassigned

Bug Description

On a desktop machine which claims to have a closed lid (on the motherboard), but doesn't actually have a lid, logind will suspend the machine shortly after boot as a safety measure to avoid burning your laptop in a bag. This is wrong in the above situation where the announced lid switch does not exist or isn't actually closed.

Workaround: Set HandleLidSwitch=ignore in /etc/systemd/logind.conf

Tags: vivid
Revision history for this message
Colin Ian King (colin-king) wrote :
summary: - desktop machine suspends after ~3 minutes, does suspend with upstart
+ desktop machine suspends after ~3 minutes, does NOT suspend with upstart
summary: - desktop machine suspends after ~3 minutes, does NOT suspend with upstart
+ desktop machine suspends with systemd after ~3 minutes, does NOT suspend
+ with upstart
description: updated
Changed in systemd (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Martin Pitt (pitti) wrote : Re: desktop machine suspends with systemd after ~3 minutes, does NOT suspend with upstart

The log shows that something requests the suspend over D-Bus. The most probable candidate for this is logind. After this happens, can you please copy&paste the output of "journalctl -f -b -u systemd-logind" here? That should show key presses and which buttons it listens to.

To confirm that it's logind, could you check if that stops happening if you set HandleLidSwitch=ignore in /etc/systemd/logind.conf and then reboot? There's a chance that it isn't actually logind as we use the exact same logind and lid switch handling under upstart, just that we replace the actual suspend D-Bus interface with systemd-shim.

Thanks!

Changed in systemd (Ubuntu):
status: New → Incomplete
Revision history for this message
Colin Ian King (colin-king) wrote :

journalctl -f -b -u systemd-logind

-- Logs begin at Tue 2015-03-31 09:36:25 BST. --
Mar 31 09:36:26 skylake systemd[1]: Starting Login Service...
Mar 31 09:36:26 skylake systemd-logind[671]: New seat seat0.
Mar 31 09:36:26 skylake systemd-logind[671]: Watching system buttons on /dev/input/event3 (Power Button)
Mar 31 09:36:26 skylake systemd-logind[671]: Watching system buttons on /dev/input/event0 (Lid Switch)
Mar 31 09:36:26 skylake systemd-logind[671]: Watching system buttons on /dev/input/event1 (Power Button)
Mar 31 09:36:26 skylake systemd-logind[671]: Watching system buttons on /dev/input/event2 (Sleep Button)
Mar 31 09:36:26 skylake systemd[1]: Started Login Service.
Mar 31 09:36:32 skylake systemd-logind[671]: New session 1 of user king.
Mar 31 09:39:22 skylake systemd-logind[671]: Suspending...

Revision history for this message
Colin Ian King (colin-king) wrote :

The HandleLidSwitch=ignore stops the box from suspending:

$ uptime
 09:50:25 up 5 min, 2 users, load average: 0.00, 0.00, 0.00

Revision history for this message
Martin Pitt (pitti) wrote :

Interesting.. so logind does *not* log a button event; here I get something like

  Mär 31 11:04:09 donald systemd-logind[805]: Power key pressed.

on a button event. Can you check if the bug happens:

  1) if you just log into VT1 after boot, not into X, and run "sudo systemctl stop lightdm"

  2) if you run "initctl stop unity-settings-daemon" and "initctl stop indicator-session" in the Unity session?

Revision history for this message
Colin Ian King (colin-king) wrote :

I'm actually running vivid server on this box, so I don't have a unity session running.

Revision history for this message
Martin Pitt (pitti) wrote :

Ah, ok; Some more ideas:

  - "sudo systemctl stop acpid", to rule out acpid and acpi-support

 - install "evtest" and run "sudo evtest", select the lid switch, and check if you actually get an event after ~ 3 minutes? I don't actually expect one, as logind should then log a "Lid closed." event which it apparently doesn't; but let's verify.

Is it possible for me to get ssh access to this machine? (https://launchpad.net/~pitti/+sshkeys, possibly via you doing an ssh port forward to chinstrap?)

Revision history for this message
Martin Pitt (pitti) wrote :

Debugging notes:

 - I see tons of "Refusing operation, as it is turned off" in the logs now, which is due to the disabled HandleLidSwitch (before it was "Suspending...")

 - I see no event at all in evtest

 - I believe stopping acpid does not help, I still see the "Refusing operation..." messages from logind.

Revision history for this message
Martin Pitt (pitti) wrote :

Lowering severity a bit. This only affects a particular piece of yet unreleased hardware, and thus it isn't very widespread.

Changed in systemd (Ubuntu):
importance: Critical → High
Revision history for this message
Martin Pitt (pitti) wrote :

This still makes little sense to me. I checked all code paths that call manager_handle_action(), and they all have a log_info() before it. But logind doesn't log any of those. Yesterday I only checked evtest on event0 (the lid switch), I figure we should also watch event1 and event3 (power buttons), and most importantly event2 (sleep button).

It's also worthwhile copying my locally built systemd-logind, stopping the system one, running the local one under gdb and breaking on manager_handle_action().

Changed in systemd (Ubuntu):
assignee: nobody → Martin Pitt (pitti)
Revision history for this message
Martin Pitt (pitti) wrote :
Download full text (3.3 KiB)

This happens around the time when this fires:

Breakpoint 1, manager_handle_action (m=0x555555616010, inhibit_key=INHIBIT_HANDLE_LID_SWITCH, handle=HANDLE_IGNORE,
    ignore_inhibited=true, is_edge=false) at src/login/logind-action.c:36
36 in src/login/logind-action.c

(gdb) bt full
#0 manager_handle_action (m=0x555555616010, inhibit_key=INHIBIT_HANDLE_LID_SWITCH, handle=HANDLE_IGNORE, ignore_inhibited=true,
    is_edge=false) at src/login/logind-action.c:36
        message_table = {0x0, 0x5555555e8f9c "Powering Off...", 0x5555555e8fac "Rebooting...", 0x5555555e8fb9 "Halting...",
          0x5555555e8fc4 "Rebooting via kexec...", 0x5555555e8fdb "Suspending...", 0x5555555e8fe9 "Hibernating...",
          0x5555555e8ff8 "Hibernating and suspending...", 0x0}
        target_table = {0x0, 0x5555555e9016 "poweroff.target", 0x5555555e9026 "reboot.target", 0x5555555e9034 "halt.target",
          0x5555555e9040 "kexec.target", 0x5555555e904d "suspend.target", 0x5555555e905c "hibernate.target",
          0x5555555e906d "hybrid-sleep.target", 0x0}
        error = {name = 0x7fffffffe440 "p\344\377\377\377\177",
          message = 0x555555565aba <manager_is_docked_or_multiple_displays+162> "\211E\354\203", <incomplete sequence \354>,
          _need_free = -7120}
        inhibit_operation = 1432480080
        offending = 0x90a9bdd80dca0d00
        supported = false
        r = 21845
        __PRETTY_FUNCTION__ = "manager_handle_action"
        __func__ = "manager_handle_action"
#1 0x0000555555566572 in button_lid_switch_handle_action (manager=0x555555616010, is_edge=false)
    at src/login/logind-button.c:108
        handle_action = HANDLE_IGNORE
        __PRETTY_FUNCTION__ = "button_lid_switch_handle_action"
#2 0x0000555555566601 in button_recheck (e=0x555555621420, userdata=0x555555625e40) at src/login/logind-button.c:117
        b = 0x555555625e40
        __PRETTY_FUNCTION__ = "button_recheck"
#3 0x00005555555b837b in source_dispatch (s=0x555555621420) at src/libsystemd/sd-event/sd-event.c:2150
        r = 0
        __PRETTY_FUNCTION__ = "source_dispatch"
        __func__ = "source_dispatch"
#4 0x00005555555b968d in sd_event_dispatch (e=0x555555617210) at src/libsystemd/sd-event/sd-event.c:2471
        p = 0x555555621420
        r = 0
        __PRETTY_FUNCTION__ = "sd_event_dispatch"
#5 0x00005555555b97ec in sd_event_run (e=0x555555617210, timeout=18446744073709551615)
    at src/libsystemd/sd-event/sd-event.c:2494
        r = 1
        __PRETTY_FUNCTION__ = "sd_event_run"
#6 0x0000555555563cac in manager_run (m=0x555555616010) at src/login/logind.c:1117
        us = 18446744073709551615
        r = 0
        __PRETTY_FUNCTION__ = "manager_run"
#7 0x0000555555563f96 in main (argc=1, argv=0x7fffffffe6d8) at src/login/logind.c:1179
        m = 0x555555616010
        r = 0
        __func__ = "main"

The is_edge==false is interesting, which proves that it's not coming from logind-button.c's button_dispatch(). This is consistent with evtest not getting any event.

button_check_switches() apparently detects that the lid is detected as closed (note: this machine doesn't actually have a lid). Then button_install_check_event_source() fires off ...

Read more...

Revision history for this message
Martin Pitt (pitti) wrote :

This looks like an effect of http://cgit.freedesktop.org/systemd/systemd/commit/?id=ed4ba7e4f65215. This introduced logic which deliberately checks for a closed lid and suspends the machine again, to guard against wakeups which happen when a laptop lid is closed (and the machine is potentially being carried around in a bag). The root cause of this is that this hardware reports a lid switch and reports it as closed, while there actually is no lid switch. This will probably confuse other power management bits like unity-settings-daemon or upowerd as well.

This is being detected by http://cgit.freedesktop.org/systemd/systemd/tree/src/login/logind-button.c#n275 in button_check_switches(). upower --dump confirms this as well:

  lid-is-closed: yes
  lid-is-present: yes

Changed in systemd (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
Martin Pitt (pitti) wrote :

At this point we have two options:

(1) fix the kernel or hardware to not claim that there's a closed lid
(2) disable that safety feature and cause some laptops to burn in your bag again

If possible, I'd like to keep this and fix (1), but if this turns out to affect too many machines we might also go with (2).

description: updated
Changed in systemd (Ubuntu):
importance: High → Low
summary: - desktop machine suspends with systemd after ~3 minutes, does NOT suspend
- with upstart
+ suspends after a few minutes on machines with a bogus (closed) lid
+ switch
Martin Pitt (pitti)
Changed in systemd (Ubuntu):
assignee: Martin Pitt (pitti) → nobody
tags: added: vivid
Revision history for this message
Martin Pitt (pitti) wrote :

Let's call this wontfix for now, as this is broken hardware, there is a workaround, and in general we do want to suspend if the lid is closed.

Note that with bug 1444166 fixed this interval will actually get down to 30s.

Changed in systemd (Ubuntu Vivid):
status: Triaged → Won't Fix
Changed in systemd (Ubuntu):
status: Triaged → Won't Fix
Revision history for this message
sinecure (dave-launchpad) wrote :

You're calling this a won't fix, but this behavior means that laptops on docks or connected to external displays (while the laptop is closed) go to sleep and are unusable: that seems to be a bit harsh to those who use an external monitor with the laptop closed.

Revision history for this message
Martin Pitt (pitti) wrote :

sinecure, this bug does not apply to laptops, only to this pre-production desktop box with a broken lid switch but no lid. Laptops usually have a working lid switch.

Revision history for this message
sinecure (dave-launchpad) wrote :

My apologies: I'd just found this bug after searching for the reason my laptop would suspend after ~30 seconds whenever it was plugged into a dock or external monitor with the lid closed after upgrading to 15.04, and did not read closely enough because the symptoms were so similar (and I was so frustrated). Thanks for the nudge.

Revision history for this message
Brock Sides (philarete) wrote :

It applies to laptops as well. At least, the bug I opened that was marked as a duplicate of the one was on a laptop.

Revision history for this message
Romain (romain-3) wrote :

Yes, this definitively reproduces on laptops — as in bug #1473721.
I’ll try to build an old kernel to see if a previous version was reporting the lid state correctly.

Revision history for this message
Romain (romain-3) wrote :

Here is the result of a fwts test:

lid: Interactive lid button test.
--------------------------------------------------------------------------------
Test 1 of 3: Test LID buttons report open correctly.
FAILED [HIGH] LidNotOpen: Test 1, Detected a closed LID state.

Test 2 of 3: Test LID buttons on a single open/close.
Got 4 SCI interrupt(s).
Got 4 interrupt(s) on GPE gpe17.
Got 4 interrupt(s) on GPE gpe_all.
PASSED: Test 2, Detected ACPI LID events while waiting for LID to closed.
FAILED [HIGH] NoLidState: Test 2, Could not detect lid closed state.
Got 2 SCI interrupt(s).
Got 2 interrupt(s) on GPE gpe17.
Got 2 interrupt(s) on GPE gpe_all.
PASSED: Test 2, Detected ACPI LID events while waiting for LID to open.
FAILED [HIGH] NoLidState: Test 2, Could not detect lid open state.

Test 3 of 3: Test LID buttons on multiple open/close events.
Some machines may have EC or ACPI faults that cause detection of multiple open
/close events to fail.
Got 2 SCI interrupt(s).
Got 2 interrupt(s) on GPE gpe17.
Got 2 interrupt(s) on GPE gpe_all.
PASSED: Test 3, Detected ACPI LID events while waiting for LID to closed.
FAILED [HIGH] NoLidState: Test 3, Could not detect lid closed state.
Got 2 SCI interrupt(s).
Got 2 interrupt(s) on GPE gpe17.
Got 2 interrupt(s) on GPE gpe_all.
PASSED: Test 3, Detected ACPI LID events while waiting for LID to open.
FAILED [HIGH] NoLidState: Test 3, Could not detect lid open state.
Got 2 SCI interrupt(s).
Got 2 interrupt(s) on GPE gpe17.
Got 2 interrupt(s) on GPE gpe_all.
PASSED: Test 3, Detected ACPI LID events while waiting for LID to closed.
FAILED [HIGH] NoLidState: Test 3, Could not detect lid closed state.
Got 2 SCI interrupt(s).
Got 2 interrupt(s) on GPE gpe17.
Got 2 interrupt(s) on GPE gpe_all.
PASSED: Test 3, Detected ACPI LID events while waiting for LID to open.
FAILED [HIGH] NoLidState: Test 3, Could not detect lid open state.
Got 2 SCI interrupt(s).
Got 2 interrupt(s) on GPE gpe17.
Got 2 interrupt(s) on GPE gpe_all.
PASSED: Test 3, Detected ACPI LID events while waiting for LID to closed.
FAILED [HIGH] NoLidState: Test 3, Could not detect lid closed state.
Got 2 SCI interrupt(s).
Got 2 interrupt(s) on GPE gpe17.
Got 2 interrupt(s) on GPE gpe_all.
PASSED: Test 3, Detected ACPI LID events while waiting for LID to open.
FAILED [HIGH] NoLidState: Test 3, Could not detect lid open state.

Definitively wrong LID detection, even on laptop. Digging in that direction.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.