Suspend stopped working from 4.4.0-157 onwards

Bug #1844021 reported by uBo on 2019-09-14
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Unassigned
Disco
Undecided
Unassigned

Bug Description

Since upgrading the kernel above 4.4.0-154, starting with 4.4.0-157, suspend does not work for me anymore. Never had such an issue in the last 6 years.

This is the relevant output when suspend fails:
[ 328.288885] PM: Suspending system (mem)
[ 328.288902] Suspending console(s) (use no_console_suspend to debug)
[ 328.289118] wlp1s0: deauthenticating from xx:xx:xx:xx:xx:xx by local choice (Reason: 3=DEAUTH_LEAVING)
[ 328.289209] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[ 328.289570] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 328.289600] sd 0:0:0:0: [sda] Stopping disk
[ 328.291700] sd 1:0:0:0: [sdb] Stopping disk
[ 328.312413] xhci_hcd 0000:03:00.0: WARN: xHC save state timeout
[ 328.312456] suspend_common(): xhci_pci_suspend+0x0/0x70 returns -110
[ 328.312461] pci_pm_suspend(): hcd_pci_suspend+0x0/0x30 returns -110
[ 328.312465] dpm_run_callback(): pci_pm_suspend+0x0/0x150 returns -110
[ 328.312483] PM: Device 0000:03:00.0 failed to suspend async: error -110
[ 328.848109] PM: Some devices failed to suspend, or early wake event detected

This is my relevant lspci -v output:

03:00.0 USB controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB Host Controller (prog-if 30 [XHCI])
 Subsystem: Samsung Electronics Co Ltd ASM1042 SuperSpeed USB Host Controller
 Flags: bus master, fast devsel, latency 0, IRQ 16
 Memory at f0500000 (64-bit, non-prefetchable) [size=32K]
 Capabilities: <access denied>
 Kernel driver in use: xhci_hcd

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1844021

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Kai-Heng Feng (kaihengfeng) wrote :

Can you please test 4.4.0-163.191 from xenial-proposed?

uBo (ubo) wrote :

Just tested it with 4.4.0-163.191, unfortunately not fixed:
[ 116.625729] sd 1:0:0:0: [sdb] Stopping disk
[ 116.647127] xhci_hcd 0000:03:00.0: WARN: xHC save state timeout
[ 116.647160] suspend_common(): xhci_pci_suspend+0x0/0x70 returns -110
[ 116.647170] pci_pm_suspend(): hcd_pci_suspend+0x0/0x30 returns -110
[ 116.647178] dpm_run_callback(): pci_pm_suspend+0x0/0x150 returns -110
[ 116.647187] PM: Device 0000:03:00.0 failed to suspend async: error -110
[ 117.182310] PM: Some devices failed to suspend, or early wake event detected
[ 117.183570] rtc_cmos 00:01: System wakeup disabled by ACPI

Kai-Heng Feng (kaihengfeng) wrote :

The most suspicious commit is "xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic()", but before doing any further test, can you please confirm if latest mainline kernel works?
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3/

uBo (ubo) wrote :

Just tested. Same issue with the latest mainline kernel:

[ 100.978395] xhci_hcd 0000:03:00.0: WARN: xHC save state timeout
[ 100.978435] PM: suspend_common(): xhci_pci_suspend+0x0/0xd0 returns -110
[ 100.978443] PM: pci_pm_suspend(): hcd_pci_suspend+0x0/0x30 returns -110
[ 100.978453] PM: dpm_run_callback(): pci_pm_suspend+0x0/0x150 returns -110
[ 100.978463] PM: Device 0000:03:00.0 failed to suspend async: error -110
[ 101.027735] PM: Some devices failed to suspend, or early wake event detected

Kai-Heng Feng (kaihengfeng) wrote :

Can you please test v5.2-rc2 and v5.2-rc3? The commit was introduced in v5.2-rc3.

uBo (ubo) wrote :

Just tested both:
Suspend works successfully in v5.2-rc2
It does not work (with same error as above) in v5.2-rc3

Thank you!

uBo (ubo) on 2019-09-19
description: updated
uBo (ubo) wrote :

I tried, but get 403 Forbidden on the links...

Kai-Heng Feng (kaihengfeng) wrote :

Fixed, please try again.

uBo (ubo) wrote :

Also, resume returns to a black screen only.

Kai-Heng Feng (kaihengfeng) wrote :

linux-modules-extra package was missed.

uBo (ubo) wrote :

After installing the missing package, everything works as expected.

Suspend and resume work fine. Thank you once again!

diddly (dflogeras2) wrote :

I am seeing this issue on Linux Mint with anything later than the 4.15.0-58-generic #64-16.04.1-Ubuntu kernel.

Will this be backported to the 4.15 series as well? Or do I need to file a separate bug?

Kai-Heng Feng (kaihengfeng) wrote :

diddly, it's highly likely a different issue since it's a different kernel. Please file a separate bug report.

Kai-Heng Feng (kaihengfeng) wrote :

diddly, I think you can be right, the same patch was introduced since Ubuntu-4.15.0-59.66.

diddly (dflogeras2) wrote :

Shoot, I missed these comments because I forgot to subscribe. kaihengfeng, do you require anything from me? Can I help test? It looks like you changed your mind, do you still want me to track in a separate bug, or wait for a testing kernel?

diddly (dflogeras2) wrote :

Thanks, but when I went to test it couldn't get this installed. Is it possibly a corrupted deb file?

When I tried the linux-modules deb, I got this

(Reading database ... 221473 files and directories currently installed.)
Preparing to unpack linux-modules-4.15.0-66-generic_4.15.0-66.75~lp1844021_amd64.deb ...
Unpacking linux-modules-4.15.0-66-generic (4.15.0-66.75~lp1844021) ...
dpkg-deb (subprocess): cannot copy archive member from 'linux-modules-4.15.0-66-generic_4.15.0-66.75~lp1844021_amd64.deb' to decompressor pipe: unexpected end of file or stream
dpkg-deb (subprocess): decompressing archive member: lzma error: unexpected end of input
dpkg-deb: error: subprocess <decompress> returned error exit status 2
dpkg: error processing archive linux-modules-4.15.0-66-generic_4.15.0-66.75~lp1844021_amd64.deb (--install):
 cannot copy extracted data for './lib/modules/4.15.0-66-generic/kernel/drivers/scsi/qla2xxx/qla2xxx.ko' to '/lib/modules/4.15.0-66-generic/kernel/drivers/scsi/qla2xxx/qla2xxx.ko.dpkg-new': unexpected end of file or stream
Errors were encountered while processing:
 linux-modules-4.15.0-66-generic_4.15.0-66.75~lp1844021_amd64.deb

Kai-Heng Feng (kaihengfeng) wrote :

Please try again.

diddly (dflogeras2) wrote :

I wont have access to that particular machine until Fri, but I did verify I could update another install in a VM. The only weirdness I saw was that after installing with dpkg -i *.deb, aptitude wanted to remove the headers since it depends on libssl1.1 > 1.1.0 which Mint does not have I guess. I figured it would not affect what we're testing here.

I'll report back on Fri when I verify on real hardware. And thanks!

diddly (dflogeras2) wrote :

keihengfeng, thank you, the new packages you made worked for me and solved the suspend issue on the affected laptop.

uBo (ubo) wrote :

Can we expect these commit to make it to the official release for future kernel releases?

Kai-Heng Feng (kaihengfeng) wrote :

Yes it should be automatically picked up by later kernel releases.

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
uBo (ubo) on 2019-10-22
tags: added: verification-done-xenial
removed: verification-needed-xenial
diddly (dflogeras2) wrote :

Does this mean this patch will only be collected for Xenial, and the 4.4.0 kernel? I don't want to monkey with the tags, but is there a way to append one for Bionic (I'm assuming that's where Mint gets their 4.15.0 kernels).

If not, do I need to open a new bug and cross ref this one?

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-eoan' to 'verification-done-eoan'. If the problem still exists, change the tag 'verification-needed-eoan' to 'verification-failed-eoan'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-eoan
uBo (ubo) on 2019-10-24
tags: added: verification-done-eoan
removed: verification-needed-eoan
diddly (dflogeras2) on 2019-10-25
tags: added: verification-needed-bionic
tags: removed: verification-needed-bionic

All autopkgtests for the newly accepted linux-gcp-5.3 (5.3.0-1008.9~18.04.1) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

linux-gcp-5.3/unknown (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#linux-gcp-5.3

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Launchpad Janitor (janitor) wrote :
Download full text (53.1 KiB)

This bug was fixed in the package linux - 5.3.0-22.24

---------------
linux (5.3.0-22.24) eoan; urgency=medium

  * [REGRESSION] md/raid0: cannot assemble multi-zone RAID0 with default_layout
    setting (LP: #1849682)
    - Revert "md/raid0: avoid RAID0 data corruption due to layout confusion."

  * refcount underflow and type confusion in shiftfs (LP: #1850867) // CVE-2019-15793
    - SAUCE: shiftfs: Correct id translation for lower fs operations
    - SAUCE: shiftfs: prevent type confusion
    - SAUCE: shiftfs: Fix refcount underflow in btrfs ioctl handling

  * CVE-2018-12207
    - kvm: x86, powerpc: do not allow clearing largepages debugfs entry
    - SAUCE: KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is
      active
    - SAUCE: x86: Add ITLB_MULTIHIT bug infrastructure
    - SAUCE: kvm: mmu: ITLB_MULTIHIT mitigation
    - SAUCE: kvm: Add helper function for creating VM worker threads
    - SAUCE: kvm: x86: mmu: Recovery of shattered NX large pages
    - SAUCE: cpu/speculation: Uninline and export CPU mitigations helpers
    - SAUCE: kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT

  * CVE-2019-11135
    - x86/msr: Add the IA32_TSX_CTRL MSR
    - x86/cpu: Add a helper function x86_read_arch_cap_msr()
    - x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default
    - x86/speculation/taa: Add mitigation for TSX Async Abort
    - x86/speculation/taa: Add sysfs reporting for TSX Async Abort
    - kvm/x86: Export MDS_NO=0 to guests when TSX is enabled
    - x86/tsx: Add "auto" option to the tsx= cmdline parameter
    - x86/speculation/taa: Add documentation for TSX Async Abort
    - x86/tsx: Add config options to set tsx=on|off|auto
    - [Config] Disable TSX by default when possible

  * CVE-2019-0154
    - SAUCE: drm/i915: Lower RM timeout to avoid DSI hard hangs
    - SAUCE: drm/i915/gen8+: Add RC6 CTX corruption WA

  * CVE-2019-0155
    - SAUCE: drm/i915: Rename gen7 cmdparser tables
    - SAUCE: drm/i915: Disable Secure Batches for gen6+
    - SAUCE: drm/i915: Remove Master tables from cmdparser
    - SAUCE: drm/i915: Add support for mandatory cmdparsing
    - SAUCE: drm/i915: Support ro ppgtt mapped cmdparser shadow buffers
    - SAUCE: drm/i915: Allow parsing of unsized batches
    - SAUCE: drm/i915: Add gen9 BCS cmdparsing
    - SAUCE: drm/i915/cmdparser: Use explicit goto for error paths
    - SAUCE: drm/i915/cmdparser: Add support for backward jumps
    - SAUCE: drm/i915/cmdparser: Ignore Length operands during command matching

linux (5.3.0-21.22) eoan; urgency=medium

  * eoan/linux: 5.3.0-21.22 -proposed tracker (LP: #1850486)

  * Fix signing of staging modules in eoan (LP: #1850234)
    - [Packaging] Leave unsigned modules unsigned after adding .gnu_debuglink

linux (5.3.0-20.21) eoan; urgency=medium

  * eoan/linux: 5.3.0-20.21 -proposed tracker (LP: #1849064)

  * eoan: alsa/sof: Enable SOF_HDA link and codec (LP: #1848490)
    - [Config] Enable SOF_HDA link and codec

  * Eoan update: 5.3.7 upstream stable release (LP: #1848750)
    - panic: ensure preemption is disabled during panic()
    - [Config] updateconfigs for USB_RIO500
    - USB: rio500: Remove Rio 500 kernel driver
   ...

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Connor Kuehl (connork) on 2019-11-14
Changed in linux (Ubuntu Disco):
status: New → Fix Committed
Connor Kuehl (connork) on 2019-11-14
Changed in linux (Ubuntu Bionic):
status: New → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-disco' to 'verification-done-disco'. If the problem still exists, change the tag 'verification-needed-disco' to 'verification-failed-disco'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-disco
tags: added: verification-needed-bionic

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

diddly (dflogeras2) on 2019-11-15
tags: added: verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (19.3 KiB)

This bug was fixed in the package linux - 5.0.0-37.40

---------------
linux (5.0.0-37.40) disco; urgency=medium

  * disco/linux: 5.0.0-37.40 -proposed tracker (LP: #1852253)

  * System hangs at early boot (LP: #1851216)
    - x86/timer: Skip PIT initialization on modern chipsets

  * drm/i915: Add support for another CMP-H PCH (LP: #1848491)
    - drm/i915/cml: Add second PCH ID for CMP

  * Some EFI systems fail to boot in efi_init() when booted via maas
    (LP: #1851810)
    - efi: efi_get_memory_map -- increase map headroom

  * seccomp: fix SECCOMP_USER_NOTIF_FLAG_CONTINUE test (LP: #1849281)
    - SAUCE: seccomp: avoid overflow in implicit constant conversion
    - SAUCE: seccomp: rework define for SECCOMP_USER_NOTIF_FLAG_CONTINUE
    - SAUCE: seccomp: fix SECCOMP_USER_NOTIF_FLAG_CONTINUE test

  * dkms artifacts may expire from the pool (LP: #1850958)
    - [Packaging] dkms -- try launchpad librarian for pool downloads
    - [Packaging] dkms -- dkms-build quieten wget verbiage

  * update ENA driver to version 2.1.0 (LP: #1850175)
    - net: ena: fix swapped parameters when calling
      ena_com_indirect_table_fill_entry
    - net: ena: fix: Free napi resources when ena_up() fails
    - net: ena: fix incorrect test of supported hash function
    - net: ena: fix return value of ena_com_config_llq_info()
    - net: ena: improve latency by disabling adaptive interrupt moderation by
      default
    - net: ena: fix ena_com_fill_hash_function() implementation
    - net: ena: add handling of llq max tx burst size
    - net: ena: ethtool: add extra properties retrieval via get_priv_flags
    - net: ena: replace free_tx/rx_ids union with single free_ids field in
      ena_ring
    - net: ena: arrange ena_probe() function variables in reverse christmas tree
    - net: ena: add newline at the end of pr_err prints
    - net: ena: documentation: update ena.txt
    - net: ena: allow automatic fallback to polling mode
    - net: ena: add support for changing max_header_size in LLQ mode
    - net: ena: optimise calculations for CQ doorbell
    - net: ena: add good checksum counter
    - net: ena: use dev_info_once instead of static variable
    - net: ena: add MAX_QUEUES_EXT get feature admin command
    - net: ena: enable negotiating larger Rx ring size
    - net: ena: make ethtool show correct current and max queue sizes
    - net: ena: allow queue allocation backoff when low on memory
    - net: ena: add ethtool function for changing io queue sizes
    - net: ena: remove inline keyword from functions in *.c
    - net: ena: update driver version from 2.0.3 to 2.1.0
    - net: ena: Fix bug where ring allocation backoff stopped too late
    - Revert "net: ena: ethtool: add extra properties retrieval via
      get_priv_flags"
    - net: ena: don't wake up tx queue when down
    - net: ena: clean up indentation issue

  * Add Intel Comet Lake ethernet support (LP: #1848555)
    - SAUCE: e1000e: Add support for Comet Lake

  * Intel Wireless AC 3168 on Eoan complaints FW error in SYNC CMD
    GEO_TX_POWER_LIMIT (LP: #1846016)
    - iwlwifi: exclude GEO SAR support for 3168

  * tsc marked unstable after entered PC10 on Intel CoffeeLake (LP: #1840239...

Changed in linux (Ubuntu Disco):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (28.6 KiB)

This bug was fixed in the package linux - 4.15.0-72.81

---------------
linux (4.15.0-72.81) bionic; urgency=medium

  * bionic/linux: 4.15.0-72.81 -proposed tracker (LP: #1854027)

  * [Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX
    (LP: #1853326)
    - Revert "arm64: Use firmware to detect CPUs that are not affected by
      Spectre-v2"
    - Revert "arm64: Get rid of __smccc_workaround_1_hvc_*"

  * [Regression] Bionic kernel 4.15.0-71.80 can not boot on ThunderX2 and
    Kunpeng920 (LP: #1852723)
    - SAUCE: arm64: capabilities: Move setup_boot_cpu_capabilities() call to
      correct place

linux (4.15.0-71.80) bionic; urgency=medium

  * bionic/linux: 4.15.0-71.80 -proposed tracker (LP: #1852289)

  * Bionic update: upstream stable patchset 2019-10-29 (LP: #1850541)
    - panic: ensure preemption is disabled during panic()
    - f2fs: use EINVAL for superblock with invalid magic
    - [Config] updateconfigs for USB_RIO500
    - USB: rio500: Remove Rio 500 kernel driver
    - USB: yurex: Don't retry on unexpected errors
    - USB: yurex: fix NULL-derefs on disconnect
    - USB: usb-skeleton: fix runtime PM after driver unbind
    - USB: usb-skeleton: fix NULL-deref on disconnect
    - xhci: Fix false warning message about wrong bounce buffer write length
    - xhci: Prevent device initiated U1/U2 link pm if exit latency is too long
    - xhci: Check all endpoints for LPM timeout
    - usb: xhci: wait for CNR controller not ready bit in xhci resume
    - USB: adutux: fix use-after-free on disconnect
    - USB: adutux: fix NULL-derefs on disconnect
    - USB: adutux: fix use-after-free on release
    - USB: iowarrior: fix use-after-free on disconnect
    - USB: iowarrior: fix use-after-free on release
    - USB: iowarrior: fix use-after-free after driver unbind
    - USB: usblp: fix runtime PM after driver unbind
    - USB: chaoskey: fix use-after-free on release
    - USB: ldusb: fix NULL-derefs on driver unbind
    - serial: uartlite: fix exit path null pointer
    - USB: serial: keyspan: fix NULL-derefs on open() and write()
    - USB: serial: ftdi_sio: add device IDs for Sienna and Echelon PL-20
    - USB: serial: option: add Telit FN980 compositions
    - USB: serial: option: add support for Cinterion CLS8 devices
    - USB: serial: fix runtime PM after driver unbind
    - USB: usblcd: fix I/O after disconnect
    - USB: microtek: fix info-leak at probe
    - USB: dummy-hcd: fix power budget for SuperSpeed mode
    - usb: renesas_usbhs: gadget: Do not discard queues in
      usb_ep_set_{halt,wedge}()
    - usb: renesas_usbhs: gadget: Fix usb_ep_set_{halt,wedge}() behavior
    - USB: legousbtower: fix slab info leak at probe
    - USB: legousbtower: fix deadlock on disconnect
    - USB: legousbtower: fix potential NULL-deref on disconnect
    - USB: legousbtower: fix open after failed reset request
    - USB: legousbtower: fix use-after-free on release
    - staging: vt6655: Fix memory leak in vt6655_probe
    - iio: adc: ad799x: fix probe error handling
    - iio: adc: axp288: Override TS pin bias current for some models
    - iio: light: opt3001: fix mutex unlock race
    - efivar/ssdt: Don't iterate over EFI va...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers