Suspend does not always resume

Bug #1757445 reported by Jamie Bennett on 2018-03-21
36
This bug affects 7 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Bionic
Medium
Unassigned
linux-meta-hwe (Ubuntu)
Undecided
Unassigned

Bug Description

Dell XPS 13 9350 on AC power left running overnight, suspends after a given timeout. When coming back the next morning sometimes the laptop resumes to an aubergine desktop (just the screen, no GDM) and cursor and sits there forever. Sometimes it resumes to a black screen and cursor.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-12-generic 4.15.0-12.13
ProcVersionSignature: Ubuntu 4.15.0-12.13-generic 4.15.7
Uname: Linux 4.15.0-12-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.8-0ubuntu10
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Wed Mar 21 14:15:04 2018
InstallationDate: Installed on 2018-02-13 (36 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Alpha amd64 (20180201)
MachineType: Dell Inc. XPS 13 9350
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-12-generic root=UUID=bc5b647b-38ca-4206-9e96-774a5ac6b833 ro quiet splash vt.handoff=1
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-12-generic N/A
 linux-backports-modules-4.15.0-12-generic N/A
 linux-firmware 1.173
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/18/2017
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.5.1
dmi.board.name: 07TYC2
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.type: 9
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.5.1:bd08/18/2017:svnDellInc.:pnXPS139350:pvr:rvnDellInc.:rn07TYC2:rvrA01:cvnDellInc.:ct9:cvr:
dmi.product.family: NULL
dmi.product.name: XPS 13 9350
dmi.sys.vendor: Dell Inc.

Jamie Bennett (jamiebennett) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Will Cooke (willcooke) wrote :

I'm seeing messages in syslog that show USB device(s) not reconnecting. I spoke to Jamie and his monitor is connected over a USB-C dock. That might be involved here.

Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.16 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc6

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Thomas Reuss (thomas-reuss) wrote :

Similar problem over here, resume reproducibly fails on my Dell Mini 1018:
STR seems to work properly when closing the lid:
Blanked screen, pulsating power led etc. When I try to wake up the system, the power LED would switch on, however the screen stays black.

Same fault with Dell Inspiron 5557. On resume it takes about 4 seconds and then the lock screen appears, when trying to unlock it accepts four to five digits and then hangs completely (nor REISUB works). The only option is manually power off.

After some tests I discovered that, in my case, the locking is related to wireless. By disabling the wireless network the problem disappears, it suspends and returns normally.
So, as a test, I deactivated the power saving, as suggested in the link "https://gist.github.com/jcberthon/ea8cfe278998968ba7c5a95344bc8b55" and everything went smoothly.

Please disregard comment # 7, I celebrated too early. After some time the problem returned.
I am currently testing the 4.16 kernel (0). In a few days I will post some results.

I have been testing the Kernel 4.16.0 for two days, so far suspend and resume has been working without hanging
However, whenever I suspend and resume, at least once, when I try to shut down the system hangs and only accepts REISUB or forced shutdown.
Besides that, I lost the functionality of ureadahead, which does not work with "upstream kernels".

I noticed the existence of this error during the boot: "nouveau 0000: 01: 00.0: bus: MMIO read of 00000000 FAULT at 612004 [IBUS]".
So I decided to search and found a suggestion to include the following parameter in the kernel command line "nouveau.modeset = 0". So I decided to test it and immediately the boot errors disappeared, there were no more crashes when suspending and resuming, just as there were no more crashes when shutting down. Either with the 4.16-0 kernel nor with the official 4.15.0-20 kernel.
I suggest to those affected by this bug that they try this suggestion.

EDIT: "nouveau.modeset=0" no spaces.

Eugene San (eugenesan) on 2018-07-18
tags: added: xenial
tags: added: linux-hwe
Eugene San (eugenesan) wrote :

The bug also affects Xenial with Linux HWE (4.15.0.24.46).
No issues with Linux HWE (4.13.0.45.64).

no longer affects: linux-meta-hwe (Ubuntu Bionic)
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-meta-hwe (Ubuntu):
status: New → Confirmed
sauron (pierre-zechateau) wrote :

Hi there, I'm running the kernel 4.15.0-34-generic #37 on an ASUS N752VX with the same problem too :
Blanked screen and pulsating power led.
When I try to wake up the system, the power LED would switch on, fans starts however the screen stays black. I've plugged an external screen, without result. I have to hard reboot each time.

Eugene San (eugenesan) wrote :

Just wanted to share my observations.

At least on my machine, "failed resumes" are appearing in series after initial BSOD (black screen of death), which usually occurs after a long suspend.

After some experimentation, I've found that full/proper shut-down and boot-up cycle (in addition to the forced power-down after the initial BSOD) fixes the issue until next BSOD occurs.

My guess is something in kernel 4.15.xx triggers the BSOD and messes suspend-resume parameters in EFI until it's re-set by the procedure mentioned above.

Gary Kenneth Krueger (verify) wrote :

I've found that I can make the system stable with the following settings:

     System Settings:
  Privacy>Screen Lock: On
  Power>Dim screen when inactive: Off
  Power>Blank screen: 10m
  Power>Automatic suspend:
   On Battery Power: On
    Delay: 20m
   Plugged In: On
    Delay: 1h
  Power>When the Power Button is pressed: Suspend
  Devices>Displays>Night Light Off
     Gnome Tweaks (installed):
  Power>Suspend when laptop lid is closed: On

     Upgraded BIOS.

 Graphics are:
  Intel® Sandybridge Mobile
  NVidia Optimus

It seems to hang with "Night Light" and "Dim screen when inactive" enabled. So, I have those off. I originally had "Night Light" enabled with sunrise and sunset. That caused my system to try to hang around sunrise and sunset. And it generally would hang when unsuspending after one of those options activated.

Anyhow, I will be trying this out sometime soon to prevent that sort of trouble:

http://www.akitaonrails.com/2017/03/14/enabling-optimus-nvidia-gpu-on-the-dell-xps-15-with-linux-even-on-battery

I expect that will allow me to use the "Night Light" and "Dim screen when inactive" options.

Gary Kenneth Krueger (verify) wrote :

By the way, this issue appeared on my machine, which is a Lenovo W520.

I have turned off automatic suspend while plugged in. I need it to run financial calculations overnight. And, I have increased the battery power delay to 30m. Though that is neither here nor there.

With "Night Light" and "Dim screen when inactive" turned off, blanking the screen works every time. Suspending and restoring works every time.

I really want to get back to the "Night Light" feature, so I would like to see the bug resolved.

Gary Kenneth Krueger (verify) wrote :

I forgot to mention that before demonstrating that "Night Light" and "Dim screen when inactive", I upgraded to the latest BIOS from Lenovo on 20 March (a day prior to posting #16 and before turning off the dimming features). It is the Lenovo 85UJ25US (1.46) BIOS, which is an upgrade from 8BET55WW (1.35). It was upgraded in hopes that it would resolve the problem.

Right after the upgrade (and before turning off the dimming features), I also turned off Security->Virtualization and Security->VT-d. I don't have a need for virtualization, and it was mentioned in posts elsewhere that it may contribute to the hang on suspend issue.

Gary Kenneth Krueger (verify) wrote :

This issue sounds a lot like Bug #1757445. Maybe one should be marked as a duplicate of the other.

Gary Kenneth Krueger (verify) wrote :

Err, the issue sounds a lot like Bug #1743094. Maybe one should be marked as a duplicate of the other.

Gary Kenneth Krueger (verify) wrote :

It hung again yesterday (S 23 Mar 2019). It used to hang multiple times daily.

Anyhow, the laptop was unresponsive. The disk activity light continued as normal, though.

However, it did allow me to SSH into it, but hung completely when I tried to open gnome-control-center. That suggests that the X server was hung.

I will try to collect some details the next time it hangs.

Gary Kenneth Krueger (verify) wrote :
Download full text (6.3 KiB)

I have had 3 hangs between 24 March and 1 April. All were when I was out and about, so I couldn't SSH into the machine.

Today, I had a couple of hangs occur with no apparent cause. Disk activity continued. But, I couldn't SSH into the machine.

I killed the machine (during a hang) at 8:35:51 am.

I restarted it at 9:14:04 am.

It crashed again.

I checked /var/crash, and it had the following (I've inserted the file contents below listed files):

[ gary@Quasar | Tue 02 Apr 2019 10:11am ] ~ >dir /var/crash
total 16
drwxrwsrwt 2 root whoopsie 4096 Apr 2 09:47 ./
drwxr-xr-x 14 root root 4096 Feb 9 19:20 ../
---------- 1 root whoopsie 0 Apr 2 08:36 _lib_systemd_systemd-logind.0.crash
-rw-r--r-- 1 kernoops whoopsie 2324 Apr 2 09:45 linux-image-4.18.0-16-generic.157976.crash
 ProblemType: KernelOops
 Annotation: Your system might become unstable now and might need to be restarted.
 Date: Tue Apr 2 09:45:56 2019
 Failure: oops
 OopsText:
  watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [pool:2748]
  Modules linked in: rfcomm ccm bnep arc4 iwldvm snd_hda_codec_hdmi mac80211 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp btusb btrtl btbcm btintel kvm bluetooth snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_codec gpio_ich thinkpad_acpi snd_hda_core snd_hwdep snd_pcm iwlwifi nvram snd_seq_midi snd_seq_midi_event ecdh_generic cfg80211 snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore mei_me mei lpc_ich irqbypass intel_cstate intel_rapl_perf input_leds mac_hid serio_raw wmi_bmof sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 algif_skcipher af_alg dm_crypt crct10dif_pclmul nouveau crc32_pclmul ghash_clmulni_intel mxm_wmi pcbc i2c_algo_bit ttm drm_kms_helper aesni_intel aes_x86_64 syscopyarea crypto_simd sysfillrect cryptd sysimgblt
   fb_sys_fops glue_helper firewire_ohci sdhci_pci cqhci psmouse drm sdhci ahci e1000e firewire_core libahci crc_itu_t wmi video
  CPU: 4 PID: 2748 Comm: pool Tainted: G L 4.18.0-16-generic #17~18.04.1-Ubuntu
  Hardware name: LENOVO 4260A45/4260A45, BIOS 8BET66WW (1.46 ) 06/14/2018
  RIP: 0010:smp_call_function_many+0x22c/0x250
  Code: 75 8a 00 3b 05 c9 8c 55 01 0f 83 5c fe ff ff 48 63 c8 48 8b 13 48 03 14 cd 00 37 9c b8 8b 4a 18 83 e1 01 74 0a f3 90 8b 4a 18 <83> e1 01 75 f6 eb c7 48 c7 c2 60 60 e8 b8 4c 89 e6 89 c7 e8 cc 75
  RSP: 0018:ffffb9f488e47c88 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
  RAX: 0000000000000007 RBX: ffff9dd0bdd23b80 RCX: 0000000000000003
  RDX: ffff9dd0bdde8de0 RSI: 0000000000000000 RDI: ffff9dd0ad028ef8
  RBP: ffffb9f488e47cc0 R08: 0000000000027040 R09: ffffffffb81d5449
  R10: ffffdab51060f780 R11: 0000000000000148 R12: 0000000000000008
  R13: 0000000000023b40 R14: ffffffffb787d920 R15: ffffb9f488e47d00
  FS: 00007f9025bff700(0000) GS:ffff9dd0bdd00000(0000) knlGS:0000000000000000
  CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f9025be6ff8 CR3: 00000004268de002 CR4: 00000000000606e0
  Call Trace:

 Package: linux-image-4.18.0-16-generic 4.18.0-16.17~18.04.1
 SourcePackage: ...

Read more...

Gary Kenneth Krueger (verify) wrote :

By the way, I connected back up the sound on my machine before that most recent crash. So, it is not the cause (for anyone who may have considered that possibility).

Also, you will find this possibly relevant thread regarding smp_call_function_ (single or many):

https://lkml.org/lkml/2016/1/30/27

I do run my 8 cores hard (almost constantly) with financial simulations. The simulations pause when any core temperatures get above the half way point between "high" and "critical" until the temperatures drop well below "high". As a possibly relevant side note.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers