hibernation has stopped working in 16.10 Yakkety Yak

Bug #1641919 reported by crysman on 2016-11-15
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
nouveau-firmware (Ubuntu)
Undecided
Unassigned
pm-utils (Ubuntu)
Undecided
Unassigned

Bug Description

After upgrading to 16.10 (from 16.04) hibernation has stopped working. I've been using hibernation for years already without problems (with some initial tuning).

I am talking about using "pm-hibernate"

There is a workaround to make it work I've described here:
http://askubuntu.com/a/849679/208566

Also, there are some warnings while updating initramfs, which had not been present before:
<code>
sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-4.8.0-27-generic
W: Possible missing firmware /lib/firmware/i915/kbl_guc_ver9_14.bin for module i915
W: Possible missing firmware /lib/firmware/i915/bxt_guc_ver8_7.bin for module i915
</code>

ProblemType: Bug
DistroRelease: Ubuntu 16.10
Package: ubuntu-release-upgrader-core 1:16.10.8
ProcVersionSignature: Ubuntu 4.8.0-27.29-generic 4.8.1
Uname: Linux 4.8.0-27-generic x86_64
ApportVersion: 2.20.3-0ubuntu8
Architecture: amd64
CrashDB: ubuntu
CurrentDesktop: XFCE
Date: Tue Nov 15 12:19:19 2016
EcryptfsInUse: Yes
InstallationDate: Installed on 2015-11-06 (374 days ago)
InstallationMedia: Xubuntu 15.10 "Wily Werewolf" - Release amd64 (20151021)
PackageArchitecture: all
SourcePackage: ubuntu-release-upgrader
Symptom: dist-upgrade
UpgradeStatus: Upgraded to yakkety on 2016-11-12 (3 days ago)
VarLogDistupgradeTermlog:

crysman (crysman) wrote :
tags: added: xenial2yakkety
no longer affects: ubuntu-release-upgrader (Ubuntu)
crysman (crysman) wrote :

Unfortunatelly, in my case, the workaround described works only for first hibernation after fresh boot :/ Any next hibernation would fail ending-up with a login screen like if I had just locked the desktop.

In dmesg I see some weird info regarding nouveau - see eg. log time time "8109.997421" in the attachment dmesg.txt.

crysman (crysman) wrote :

This is especially weird:

[ 8071.341105] Suspending console(s) (use no_console_suspend to debug)
[ 8071.341993] nouveau 0000:01:00.0: DRM: suspending console...
[ 8071.342004] nouveau 0000:01:00.0: DRM: suspending display...
[ 8071.342086] nouveau 0000:01:00.0: DRM: evicting buffers...
[ 8071.342087] nouveau 0000:01:00.0: DRM: waiting for kernel channels to go idle...
[ 8071.342164] nouveau 0000:01:00.0: fifo: read fault at a700000000 engine 07 [PFIFO] client 06 [PFIFO] reason 00 [PT_NOT_PRESENT] on channel 0 [003fe12000 DRM]
[ 8071.342166] nouveau 0000:01:00.0: fifo: fifo engine fault on channel 0, recovering...
[ 8086.338867] nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
[ 8086.338868] nouveau 0000:01:00.0: DRM: resuming display...
[ 8086.339317] pci_pm_freeze(): nouveau_pmops_freeze+0x0/0x20 [nouveau] returns -16
[ 8086.339327] dpm_run_callback(): pci_pm_freeze+0x0/0xf0 returns -16
[ 8086.339336] PM: Device 0000:01:00.0 failed to freeze async: error -16

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nouveau-firmware (Ubuntu):
status: New → Confirmed
Changed in pm-utils (Ubuntu):
status: New → Confirmed
Melvin Hausmann (melvinh) wrote :

I got the same problem, hibernation working fine with Ubuntu 16.04 and now (still) not working after upgrade to 16.10 (running Linux version 4.8.0-32-generic (buildd@lcy01-34) (gcc version 6.2.0 20161005 (Ubuntu 6.2.0-5ubuntu12) ) #34-Ubuntu SMP Tue Dec 13 14:30:43 UTC 2016 (Ubuntu 4.8.0-32.34-generic 4.8.11)) / Kubuntu 16.10

Even more vexing is that when I tell my system to hibernate it gets stuck somewhere, not writing anything to the disk anymore. When running sudo pm-hibernate in tty1 I get kernel messages saying that the system has lost the synchronization with my EXT4 partitions and that journaling would be disabled therefore.

Afterwards I can only turn my Laptop off via the power button and boot again.

The last lines from /var/log/pm-hibernbate.log
Running hook /etc/pm/sleep.d/novatel_3g_suspend hibernate hibernate:
/etc/pm/sleep.d/novatel_3g_suspend hibernate hibernate: success.

Mo 2. Jan 17:24:11 CET 2017: performing hibernate

[END_OF_FILE]

Syslog extract from /var/log/syslog.1

Jan 2 17:23:34 MH4Linux systemd[1]: Starting Stop ureadahead data collection...
Jan 2 17:23:34 MH4Linux systemd[1]: Started Stop ureadahead data collection.
Jan 2 17:24:11 MH4Linux kernel: [ 119.501530] bbswitch: enabling discrete graphics
Jan 2 17:28:24 MH4Linux rsyslogd: [origin software="rsyslogd" swVersion="8.16.0" x-pid="784" x-info="http://www.rsyslog.com"] start
Jan 2 17:28:24 MH4Linux rsyslogd: rsyslogd's groupid changed to 108

Melvin Hausmann (melvinh) wrote :
crysman (crysman) wrote :

I do not know what has happened and/or when, but it seems to work now. I haven't done anything special, just some regular basis updates meanwhile...

<code>
❱ uname -a
Linux ASUSu36SD 4.8.0-32-generic #34-Ubuntu SMP Tue Dec 13 14:30:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
</code>

Joshua Powers (powersj) wrote :

Because the original reporter says this is working now I am marking it invalid. Feel free to reopen and mark as "New" in the event that the issue reappears. Thanks for reporting!

Changed in nouveau-firmware (Ubuntu):
status: Confirmed → Invalid
Changed in pm-utils (Ubuntu):
status: Confirmed → Invalid
mprotic (mprotic) wrote :
Download full text (29.6 KiB)

Still persists for me

Linux laptop 4.8.0-34-generic #36-Ubuntu SMP Wed Dec 21 17:24:18 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

It goes like this:

fresh boot .... ok

sudo pm-hibernate .... ok

boot from hibernate ... everything seems ok

second sudo pm-hibernate does this:

Jan 19 16:38:03 laptop kernel: [ 1337.045404] nouveau 0000:01:00.0: DRM: evicting buffers...
Jan 19 16:38:03 laptop kernel: [ 1337.045407] nouveau 0000:01:00.0: DRM: waiting for kernel channels to go idle...
Jan 19 16:38:03 laptop kernel: [ 1352.043878] nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
Jan 19 16:38:03 laptop kernel: [ 1352.043917] pci_pm_freeze(): nouveau_pmops_freeze+0x0/0x20 [nouveau] returns -16
Jan 19 16:38:03 laptop kernel: [ 1352.043920] dpm_run_callback(): pci_pm_freeze+0x0/0xf0 returns -16
Jan 19 16:38:03 laptop kernel: [ 1352.043922] PM: Device 0000:01:00.0 failed to freeze async: error -16
Jan 19 16:38:03 laptop kernel: [ 1352.050293] queueing ieee80211 work while going to suspend
Jan 19 16:38:03 laptop kernel: [ 1352.110583] usb usb3: root hub lost power or was reset
Jan 19 16:38:03 laptop kernel: [ 1352.110584] usb usb4: root hub lost power or was reset
Jan 19 16:38:03 laptop kernel: [ 1352.111949] usb usb1: root hub lost power or was reset
Jan 19 16:38:03 laptop kernel: [ 1352.112086] usb usb2: root hub lost power or was reset
Jan 19 16:38:03 laptop kernel: [ 1352.114499] sd 0:0:0:0: [sda] Starting disk
Jan 19 16:38:03 laptop kernel: [ 1352.115857] ehci-pci 0000:00:1a.0: cache line size of 64 is not supported
Jan 19 16:38:03 laptop kernel: [ 1352.115974] ehci-pci 0000:00:1d.0: cache line size of 64 is not supported
Jan 19 16:38:03 laptop kernel: [ 1352.123973] rtc_cmos 00:01: System wakeup disabled by ACPI
Jan 19 16:38:03 laptop kernel: [ 1352.124686] ------------[ cut here ]------------
Jan 19 16:38:03 laptop kernel: [ 1352.124710] WARNING: CPU: 0 PID: 7729 at /build/linux-RZGRu0/linux-4.8.0/net/mac80211/key.c:683 ieee80211_enable_keys+0x182/0x190 [mac80211]
Jan 19 16:38:03 laptop kernel: [ 1352.124734] Modules linked in: pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) ccm xfrm_user xfrm_algo br_netfilter bridge stp llc aufs ip6table_filter ip6_tables xt_conntrack iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_addrtype iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack binfmt_misc acer_wmi sparse_keymap arc4 ath9k ath9k_common ath9k_hw ath mac80211 uvcvideo videobuf2_vmalloc videobuf2_memops intel_rapl cfg80211 videobuf2_v4l2 videobuf2_core videodev x86_pkg_temp_thermal media snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel rtsx_pci_ms memstick snd_hda_codec snd_hda_core shpchp coretemp snd_hwdep snd_pcm snd_seq_midi kvm_intel snd_seq_midi_event snd_rawmidi kvm snd_seq snd_seq_device snd_timer irqbypass snd input_leds soundcore joydev intel_cstate intel_rapl_perf serio_raw mei_me mei lpc_ich mac_hid parport_pc ppdev lp parport ip_tables x_tables autofs4 algif_skcipher af_alg dm_crypt rtsx_pci_sdmmc hid_generic nouveau i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel mxm_wmi ttm i2c_algo_bit aesni_intel drm_kms_helper syscopyarea aes_x86_64 sysfil...

Changed in nouveau-firmware (Ubuntu):
status: Invalid → New
Changed in pm-utils (Ubuntu):
status: Invalid → New
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nouveau-firmware (Ubuntu):
status: New → Confirmed
Changed in pm-utils (Ubuntu):
status: New → Confirmed
Melvin Hausmann (melvinh) wrote :

In case anyone's interested, tried again after a recent Kernel Update (Linux version 4.8.0-37-generic (buildd@lcy01-17) (gcc version 6.2.0 20161005 (Ubuntu 6.2.0-5ubuntu12) ) #39-Ubuntu SMP Thu Jan 26 02:27:07 UTC 2017 (Ubuntu 4.8.0-37.39-generic 4.8.16))

But same story on tty1 (This is not a log file, typed this from a photo I took, the [...] stands for a time stamp)

melvinh@MH4Linux:~$ sudo pm-hibernate
[sudo] Passwort für melvinh
---screen goes black, system does something - no HDD activity, screen turns on again, errors appear as follows---
[...] do_IRQ: 1.226 No irq handler for vector
[...] do_IRQ: 3.178 No irq handler for vector
[...] do_IRQ: 3.178 No irq handler for vector
[...] ata2.00: revalidation failed (errno=-5)
[...] ata1.00: revalidation failed (errno=-5)
[...] do_IRQ: 3.178 No irq handler for vector
[...] do_IRQ: 3.178 No irq handler for vector
[...] do_IRQ: 3.178 No irq handler for vector
[...] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out
[...] ata2.00: revalidation failed (errno=-5)
[...] ata1.00: revalidation failed (errno=-5)
[...] do_IRQ: 3.178 No irq handler for vector
[...] do_IRQ: 3.178 No irq handler for vector
[...] do_IRQ: 3.178 No irq handler for vector
[...] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out
[...] ata2.00: revalidation failed (errno=-5)
[...] ata1.00: revalidation failed (errno=-5)
[...] do_IRQ: 3.178 No irq handler for vector
[...] blk_update_request: I/O error, dev sda, sector 939400954
[...] blk_update_request: I/O error, dev sda, sector 936002719
[...] Buffer I/O error on device sda6, logical block 4818437
melvinh@MH4Linux:~$ [...] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out
[...] EXT4-fs (sda6): Delayed block allocation failed for inode 788068 at logical offset 0 with max blocks 1 with error 5
[...] EXT4-fs (sda6): This should not happen!! Data will be lost
[...]
[...] blk_update_request: I/O error, dev sda, sector 926883199
[...] blk_update_request: I/O error, dev sda, sector 918700167
[...] Aborting journal on device sda6-8
[...] blk_update_request: I/O error, dev sda, sector 918697087
[...] blk_update_request: I/O error, dev sda, sector 918697087
[...] Buffer I/O error on dev sda6, logical block 2655233, lost sync page write
[...] JBD2: Error -5 detected when updating journal superblock for sda6-8
[...] blk_update_request: I/O error, dev sda, sector 897455223
[...] blk_update_request: I/O error, dev sda, sector 897455223
[...] Buffer I/O error on dev sda6, logical block 0, lost sync page write
[...] EXT4-fs error (device sda6): ext4_journal_check_start:56: Detected aborted journal
[...] EXT4-fs (sda6): Remounting filesystem readonly
[...] EXT4-fs (sda6): previous I/O error to superblock detected
[...] blk_update_request: I/O error, dev sda, sector 897455223
[...] blk_update_request: I/O error, dev sda, sector 897455223
[...] Buffer I/O error on dev sda6, logical block 0, lost sync page write

crysman (crysman) wrote :
Download full text (7.2 KiB)

OK, my post #8 is not valid any more. As @melvinh has mentioned above, It has stopped working recently again :(

Here is some dmesg debug info I've collected after second (failed) hibernate attempt:
"""
❱ dmesg | grep -iE "err|warn|fail|fault|block|unable"
[ 0.000000] MTRR default type: uncachable
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
[ 0.000033] pid_max: default: 32768 minimum: 301
[ 0.223975] core: PEBS disabled due to CPU errata, please upgrade microcode
[ 0.235978] x86/mm: Memory block size: 128MB
[ 0.275642] pmd_set_huge: Cannot satisfy [mem 0xe0000000-0xe0200000] with a huge-page mapping due to MTRR override.
[ 0.275788] core: PMU erratum BJ122, BV98, HSD29 worked around, HT is on
[ 0.293097] ACPI: Executed 1 blocks of module-level executable AML code
[ 0.419024] ACPI: Using IOAPIC for interrupt routing
[ 0.652383] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12)
[ 0.652437] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 *5 6 7 10 12)
[ 0.652489] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 10 12)
[ 0.652541] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 *10 12)
[ 0.652593] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 12) *0, disabled.
[ 0.652644] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 12) *0, disabled.
[ 0.652695] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 *5 6 7 10 12)
[ 0.652746] ACPI: PCI Interrupt Link [LNKH] (IRQs *3 4 5 6 7 10 12)
[ 0.652875] ACPI: Enabled 3 GPEs in block 00 to 3F
[ 0.655286] NetLabel: unlabeled traffic allowed by default
[ 0.986808] PCI: CLS 64 bytes, default 64
[ 1.667413] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 248)
[ 1.667446] io scheduler deadline registered (default)
[ 2.136864] ACPI Error: Method parse/execution failed [\_SB.PCI0.GFX0._DSM] (Node ffff9da2024c0938), AE_AML_PACKAGE_LIMIT (20160422/psparse-542)
[ 2.136872] ACPI: \_SB_.PCI0.GFX0: failed to evaluate _DSM (0x300b)
[ 2.136875] ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[ 2.136932] ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[ 2.137020] ACPI Error: Method parse/execution failed [\_SB.PCI0.GFX0._DSM] (Node ffff9da2024c0938), AE_AML_PACKAGE_LIMIT (20160422/psparse-542)
[ 2.137024] ACPI Error: Method parse/execution failed [\_SB.PCI0.PEG0.GFX0._DSM] (Node ffff9da2024d5c30), AE_AML_PACKAGE_LIMIT (20160422/psparse-542)
[ 2.137030] ACPI: \_SB_.PCI0.PEG0.GFX0: failed to evaluate _DSM (0x300b)
[ 2.137032] ACPI Warning: \_SB.PCI0.PEG0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[ 2.137088] ACPI Warning: \_SB.PCI0.PEG0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20160422/nsarguments-95)
[ 2.482766] sd 0:0:0:0: [sda] 1000215216 512-byte logical blocks: (512 GB/477 GiB)
[ 2.578711] usb 1-1.1: device descriptor read/64, error ...

Read more...

crysman (crysman) wrote :

By the way, when I remove "pci=nomsi" from /etc/default/grub 4 "system crash" reports pop up right after every reboot... I believe it is connected.

mprotic (mprotic) wrote :

This is a nouveau issue. Unload nouveau and hibernation start working normally again.

Ctrl+Alt+F1
sudo service lightdm stop
sudo modprobe -r nouveau
sudo service lightdm start

Btw, how is it possible that system works completely normal after unloading nouveau ?
Is it safe to do apt-get remove xserver-xorg-video-nouveau ?

@mprotic - be careful
xserver-xorg-video-nouveau is the Xorg driver for it
While the kernel module you unload is from linux-image-extra

None of it is safe to uninstall, the former will due to dependencies take xorg away and the latter holds much more kernel modules.
If you have a setup without nouveau at all and want to avoid it loading you should follow [1].

[1]: https://wiki.debian.org/KernelModuleBlacklisting

mprotic (mprotic) wrote :

Still happening in 17.10... Shoud we report this to nouveau bug list ?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers