Precise does not resume from S3 - kernel panic on resume

Bug #1094412 reported by Tim Hockin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers (Ubuntu)
New
Undecided
Unassigned

Bug Description

4 days ago I had Ubuntu Lucid running on this computer. Suspend and resume worked flawlessly every time.

Then I upgraded to Ubuntu Precise. Suspend seems to work, but resume fails every time. By the flashing keyboard lights, I guess it kernel paniced. It fails from the Live CD and from a fresh install.

Here is my debug so far.

Run Update Manager
  Install all updates
Reboot
Try suspend = fails

sudo apt-get install synaptic
Run synaptic
  Install linux-generic-lts-quantal (3.5.0-21)
Reboot
Try suspend = fails

Run jockey-gtk
Install "experimental" 304 nVidia driver
Reboot
NVRM error in dmesg
Add video=vesa:off vga=normal to /etc/default/grub
Run update-grub2
Reboot
No more NVRM error (also no splash screen)
Try suspend = fails

echo core > /sys/power/pm_test
echo mem > /sys/power/state
system acts like it is going to sleep, and then wakes up a few seconds later
dmesg shows:
[ 1230.083404] ------------[ cut here ]------------
[ 1230.083410] WARNING: at /build/buildd/linux-lts-quantal-3.5.0/kernel/power/suspend_test.c:53 suspend_test_finish+0x86/0x90()
[ 1230.083411] Hardware name: To Be Filled By O.E.M.
[ 1230.083412] Component: resume devices, time: 14424
[ 1230.083412] Modules linked in: snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul bnep rfcomm parport_pc ppdev nvidia(PO) snd_emu10k1 snd_ac97_codec ac97_bus snd_pcm snd_page_alloc snd_util_mem snd_hwdep snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer coretemp snd_seq_device kvm_intel kvm snd ghash_clmulni_intel soundcore aesni_intel btusb cryptd aes_x86_64 bluetooth i7core_edac edac_core microcode mac_hid lpc_ich mxm_wmi shpchp serio_raw wmi hid_generic lp parport usbhid hid r8169 pata_marvell
[ 1230.083445] Pid: 3329, comm: bash Tainted: P O 3.5.0-21-generic #32~precise1-Ubuntu
[ 1230.083446] Call Trace:
[ 1230.083448] [<ffffffff81052c9f>] warn_slowpath_common+0x7f/0xc0
[ 1230.083452] [<ffffffff81052d96>] warn_slowpath_fmt+0x46/0x50
[ 1230.083455] [<ffffffff8109b836>] suspend_test_finish+0x86/0x90
[ 1230.083457] [<ffffffff8109b53b>] suspend_devices_and_enter+0x10b/0x200
[ 1230.083460] [<ffffffff8109b701>] enter_state+0xd1/0x100
[ 1230.083463] [<ffffffff8109b74b>] pm_suspend+0x1b/0x60
[ 1230.083465] [<ffffffff8109a7a5>] state_store+0x45/0x70
[ 1230.083467] [<ffffffff81331d2f>] kobj_attr_store+0xf/0x30
[ 1230.083471] [<ffffffff811f77ff>] sysfs_write_file+0xef/0x170
[ 1230.083476] [<ffffffff811879d3>] vfs_write+0xb3/0x180
[ 1230.083480] [<ffffffff81187cfa>] sys_write+0x4a/0x90
[ 1230.083483] [<ffffffff816a6e69>] system_call_fastpath+0x16/0x1b
[ 1230.083488] ---[ end trace 839cdd0078b3ce03 ]---

Boot with init=/bin/bash
unload all modules except USBHID
echo core > /sys/power/pm_test
echo mem > /sys/power/state
system acts like it is going to sleep, and then wakes up a few seconds later
echo none > /sys/power/pm_test
echo mem > /sys/power/state
system goes to sleep
press power to resume = fails

At this point I am stumped on how to debug. This is a "modern" computer with no serial ports. It worked under Lucid, so I know it is POSSIBLE.

Mobo: ASRock X58 single-socket
CPU: Westmere 6 core (12 hyperthreads) 3.2 GHz
RAM: 12 GB ECC
Disk: sda = Intel SSD, mounted on /
Disk: sdb = Intel SSD, not mounted
Disk: sdc = Seagate HDD, not mounted
Disk: sdd = Seagate HDD, not mounted
NIC = Onboard RTL8168e/8111e
Sound = EMU1212 (emu10k1, not even configured yet)
Video = nVidia GeForce 7600 GT
KB = PS2 (also tried USB)
Mouse = USB

Suspend and resume is a must-have for me. I am at my wits end.

Tags: precise
Revision history for this message
Tim Hockin (thockin-hockin) wrote :

Running a suspend with pm_trace set, I get:

aer 0000:00:03.0:pcie02: hash matches

I don't know what magic might be needed here, though. As I said before - it worked a few days ago under Lucid.

Revision history for this message
Tim Hockin (thockin-hockin) wrote :

Update: Booting with noapic seems to work!

As before, it worked under Lucid, without the noapic flag, so it's still a bug IMO.

Revision history for this message
Tim Hockin (thockin-hockin) wrote :

Best guess:

Booting with 'noapic', I see the "irq 5: nobody cared" message on resume, along with 10000 IRQ5 counts in /proc/interrupts (the devices claiming that IRQ are quiescent).

Without 'noapic' that must be triggering something else to go haywire, perhaps the AER logic (though that is all MSI, so probably not). I'm flying blind on those boots.

I bet that, if I can recall how to re-enable IRQ5, I'll see it continuously asserting. Chipset or BIOS bug maybe. I don't know if I had AER enabled under Lucid, so that might be the difference.

Revision history for this message
Tim Hockin (thockin-hockin) wrote :

Update: Booting with 'pci-noaer' alone does not fix the problem. I seem to need 'noapic' to boot.

Revision history for this message
Tim Hockin (thockin-hockin) wrote :

that last update should have read pci=noaer

bugbot (bugbot)
tags: added: precise
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.