NVME devices and Network devices disappears upon suspend

Bug #1655100 reported by Marcus Grenängen
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
Medium
Unassigned
Yakkety
Won't Fix
Medium
Unassigned

Bug Description

Ubuntu 16.10 on a Alienware 15 R2 when resuming from suspend the NVME devices and network devices are gone.

I can force the devices to re-appear issuing echo 1 > /sys/bus/pci/rescan, that was what I did to be able to submit this bug report from a live USB stick.

ProblemType: Bug
DistroRelease: Ubuntu 16.10
Package: linux-image-4.8.0-22-generic 4.8.0-22.24
ProcVersionSignature: Ubuntu 4.8.0-22.24-generic 4.8.0
Uname: Linux 4.8.0-22-generic x86_64
ApportVersion: 2.20.3-0ubuntu8
Architecture: amd64
CasperVersion: 1.379
CurrentDesktop: Unity
Date: Mon Jan 9 17:55:50 2017
LiveMediaBuild: Ubuntu 16.10 "Yakkety Yak" - Release amd64 (20161012.2)
MachineType: Alienware Alienware 15 R2
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi file=/cdrom/preseed/hostname.seed boot=casper quiet splash ---
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.8.0-22-generic N/A
 linux-backports-modules-4.8.0-22-generic N/A
 linux-firmware 1.161
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 09/30/2016
dmi.bios.vendor: Alienware
dmi.bios.version: 1.3.9
dmi.board.name: 0X70NC
dmi.board.vendor: Alienware
dmi.board.version: A00
dmi.chassis.type: 10
dmi.chassis.vendor: Alienware
dmi.chassis.version: Not Specified
dmi.modalias: dmi:bvnAlienware:bvr1.3.9:bd09/30/2016:svnAlienware:pnAlienware15R2:pvr1.3.9:rvnAlienware:rn0X70NC:rvrA00:cvnAlienware:ct10:cvrNotSpecified:
dmi.product.name: Alienware 15 R2
dmi.product.version: 1.3.9
dmi.sys.vendor: Alienware

Revision history for this message
Marcus Grenängen (grenangen) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Marcus Grenängen (grenangen) wrote :

Addendum. Had the same issue with 14.04 and 16.04 as well. And this seems to be same/related to 1568703 as well.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.10 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10-rc3

Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Yakkety):
importance: Undecided → Medium
status: New → Confirmed
tags: added: kernel-da-key
Revision history for this message
Marcus Grenängen (grenangen) wrote :

Can do, but it will take some time, getting late in my timezone and I will need to prepare a side install for testing with upstream.

Does the 17.04 nightlies come with a live env? And if so, does that contain the 4.1 kernel perhaps? That would cut down on time to test for me in this case.

Revision history for this message
Marcus Grenängen (grenangen) wrote :

Tested latest 17.04 daily build from Jan 9th.
Issue is present there as well running kernel version 4.9.0-11. Same symptoms as with 16.10.

uname -a
Linux ubuntu 4.9.0-11-generic #12-Ubuntu SMP Mon Dec 12 16:18:23 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu Zesty Zapus (development branch)
Release: 17.04
Codename: zesty

Will have to setup a separate install to test the upstream 4.10 kernel. However, I am sceptical that it would be fixed in 4.10.

description: updated
Revision history for this message
Marcus Grenängen (grenangen) wrote :

As I suspected, issue is still present with the 4.10 kernels. Tried it on a separate clean install using 16.10 upgrading the kernel to v4.10-rc3 mainline

This warning is present for 4.10 as well as for older kernels on the Alienware 15 R2
update-initramfs: Generating /boot/initrd.img-4.10.0-041000rc3-generic
W: Possible missing firmware /lib/firmware/i915/kbl_guc_ver9_14.bin for module i915
W: Possible missing firmware /lib/firmware/i915/bxt_guc_ver8_7.bin for module i915
run-parts: executing /etc/kernel/postinst.d/pm-utils 4.10.0-041000rc3-generic /boot/vmlinuz-4.10.0-041000rc3-generic

uname -a
Linux starbase 4.10.0-041000rc3-generic #201701081831 SMP Sun Jan 8 23:33:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.10
Release: 16.10
Codename: yakkety

tags: added: kernel-bug-exists-upstream
Revision history for this message
Marcus Grenängen (grenangen) wrote :

I also tried to enable hibernation, same issue with NVME drives using hibernation as when using suspend. Skylake truly is a mess on Linux :/

Revision history for this message
Marcus Grenängen (grenangen) wrote :

Any update and/or progress on this @jsalisbury?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

Once this bug is reported upstream, please add the tag: 'kernel-bug-reported-upstream'.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu Yakkety):
status: Confirmed → Triaged
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Marcus Grenängen (grenangen) wrote :

Thanks for the reply. Since I have had some additional time on my hand I have played around a bit with kernels and firmwares.

Currently I do not think this is a upstream bug since I can get the laptop and it's NVME devices etc. to work properly using the latest Solus Linux version and that is running Kernel 4.9.6.

I did try to use the same kernel with Ubuntu 16.10 and the 17.04 ALPHA builds, but I was not able to get the suspend to work with my NVME devices and WiFi. I suspect that the linux firmware might be what is different here, but can't say for sure since I'm far from an expert on how the Linux kernel should operate with these new kinds of devices.

Hopefully this additional information might be of use for you.

Revision history for this message
Sergey Korolev (knopki) wrote :

@grenangen Just for note.
Maybe related upstream bug:
https://bugzilla.kernel.org/show_bug.cgi?id=112121

I have same hardware and same problem on Arch with mainline kernels 4.4-4.9 and later on Fedora 25 with (fedora) kernels 4.8.x-4.9.6. So something is common with Ubuntu.

I will test Solus tomorrow and try to find differences.

Revision history for this message
Marcus Grenängen (grenangen) wrote :

@knopki, did you get around to testing Solus? Found something that might help us understand what the difference is? :)

Revision history for this message
Sergey Korolev (knopki) wrote :

Just for note:
Found this thread with same bug on Mint 18 on Alienware 13 R2: https://forums.linuxmint.com/viewtopic.php?t=234368

Suspend and resume working on Solus LiveCD with kernel 4.8.15.

Different (related to ACPI and PCI) startup messages in journal on Solus:
  kernel: ACPI: Core revision 20160422
  kernel: PCI: Using configuration type 1 for base access
  kernel: ACPI: Enabled 7 GPEs in block 00 to 7F
  kernel: ACPI : EC: EC stopped
  kernel: ACPI : EC: GPE = 0x14, I/O: command/status = 0x66, data = 0x62
  kernel: ACPI : EC: EC started

Different startup messages on journal on Fedora 25 with kernel 4.9.7-201.fc25.x86_64:
  kernel: ACPI: Core revision 20160831
  kernel: acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
  kernel: PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xe0000000-0xefffffff] (base 0xe0000000)
  kernel: PCI: MMCONFIG at [mem 0xe0000000-0xefffffff] reserved in E820
  kernel: PCI: Using configuration type 1 for base access
  kernel: ACPI: \_SB_.PCI0.LPCB.EC0_: Used as first EC
  kernel: ACPI: \_SB_.PCI0.LPCB.EC0_: GPE=0x14, EC_CMD/EC_SC=0x66, EC_DATA=0x62
  kernel: ACPI: \_SB_.PCI0.LPCB.EC0_: Used as boot DSDT EC to handle transactions
  kernel: acpiphp: Slot [1] registered
  kernel: ACPI: Enabled 7 GPEs in block 00 to 7F
  kernel: ACPI : EC: event unblocked
  kernel: ACPI: \_SB_.PCI0.LPCB.EC0_: GPE=0x14, EC_CMD/EC_SC=0x66, EC_DATA=0x62
  kernel: ACPI: \_SB_.PCI0.LPCB.EC0_: Used as boot DSDT EC to handle transactions and events

So, old kernel has older ACPI revision. Don't know what the rest of the messages mean. But Solus using only pciehp module (PCI Express Hot Plug Controller Driver). New kernel using pciehp module AND acpiphp (ACPI PCI Hot Plug Controller Driver). Something about pci hotplug and something about pci hotplug. And we have bug with pci hotplug.

AND NOW THE GREAT NEWS

Disabling acpiphp with "acpiphp.disable=1" kernel boot parameter solves our problem. Now I can suspend and I can resume.

Don't know what this workaround will break.

Also I think this is upstream bug. Can anyone report it? Because I'm too shy.

Revision history for this message
Marcus Grenängen (grenangen) wrote :

Great find @knopki, will re-install Ubuntu and use the boot parameter and see what else might break during my normal use :)

@jsalisbury does this help you guys to track down the issue and find/patch it for the Ubuntu 17.04 release?

Revision history for this message
Marcus Grenängen (grenangen) wrote :

So, been running with 16.10 using the boot parameter "acpiphp.disable=1" and yes suspend/resume works just fine for the NVME devices, no more loss of root partition etc.

Things that has some issues, but is fixable as well is WiFi, but on the other hand, WiFi in Ubuntu has been quite broken since 16.04, I have a few workarounds to make WiFi and the nm-applet work properly and that is applicable here as well, for details you can see http://grenangen.se/node/86 same is applicable for 16.10 as well.

The things that don't work: Sound output is completely broken after a couple of suspend/resume runs, only way I have found to get sound working again is by rebooting the machine.

So all in all I can finally use my really powerful Laptop with Ubuntu, would be nice to get proper patch(es) for the acpiphp issue as well as for WiFi being wonkey in general. But it is usable now for me.

Revision history for this message
Andy Whitcroft (apw) wrote : Closing unsupported series nomination.

This bug was nominated against a series that is no longer supported, ie yakkety. The bug task representing the yakkety nomination is being closed as Won't Fix.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu Yakkety):
status: Triaged → Won't Fix
Revision history for this message
WinEunuchs2Unix (ricklee518) wrote :

Confirm this bug occurs on Ubuntu 16.04 w/Alienware 17R3, i7 6700HQ, HM170 chipset. The workaround to use `acpiphp.disable=1` on kernel command line fixes suspend/resume cycle.

Revision history for this message
FireBurn (fireburn) wrote :

A patch https://patchwork.kernel.org/patch/10212201/ is going to fix this issue properly, it should be backported to older kernels too, so hopefully going forward your NVMe drives won't require any workarounds, it should also fix USB-C detection too

Revision history for this message
Marcus Grenängen (grenangen) wrote :

Great news and thank you for posting the information @fireburn :)

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

The patch has "Cc: <email address hidden>" hence Xenial's v4.4 based kernel will automatically pick this patch.

We only need to backport it to Artful's v4.13 based kernel.

Revision history for this message
Marcus Grenängen (grenangen) wrote :

Any update on this for 18.04?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Marcus Grenängen (grenangen) wrote :

Vanilla Ubuntu/Kubuntu install and this still hasn't been fixed.

Linux alien 4.15.0-20-generic #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.