Linux 3.0.0-15 causes laptops to fail to resume from suspend (Dell XPS 1645, Sony Vaio VPCF1390)

Bug #904569 reported by Ronan Jouchet on 2011-12-15
112
This bug affects 15 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Oneiric
Undecided
Tim Gardner
Precise
Medium
Unassigned

Bug Description

My laptop never had any trouble resuming from suspend, but yesterday's kernel update (3.0.0-15-generic-pae) causes resume to fail.

Scenario:
1. Suspend, wait for laptop to enter suspend mode
2. Press power button

Expected
1. Laptop should resume and display login screen

Actual
1. Laptop appears to resume, but only displays a black screen. Alt+F1/...F12 keys and the Fn+REISUB combination are without effect

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: linux-image-3.0.0-15-generic-pae 3.0.0-15.24
ProcVersionSignature: Ubuntu 3.0.0-15.24-generic-pae 3.0.13
Uname: Linux 3.0.0-15-generic-pae i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 1.23-0ubuntu4
Architecture: i386
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: STAC92xx Analog [STAC92xx Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: ronj 2153 F.... pulseaudio
 /dev/snd/controlC0: ronj 2153 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf1000000 irq 49'
   Mixer name : 'IDT 92HD73C1X5'
   Components : 'HDA:111d7675,102802fe,00100103'
   Controls : 16
   Simple ctrls : 10
Card1.Amixer.info:
 Card hw:1 'Generic'/'HD-Audio Generic at 0xcfedc000 irq 50'
   Mixer name : 'ATI R6xx HDMI'
   Components : 'HDA:1002aa01,00aa0100,00100200'
   Controls : 4
   Simple ctrls : 1
Card1.Amixer.values:
 Simple mixer control 'IEC958',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [on]
Card2.Amixer.info:
 Card hw:2 'nanoKONTROL'/'KORG INC. nanoKONTROL at usb-0000:00:1d.0-1.3.1.3, full speed'
   Mixer name : ''
   Components : 'USB0944:010f'
   Controls : 0
   Simple ctrls : 0
Card2.Amixer.values:

Date: Wed Dec 14 22:46:14 2011
EcryptfsInUse: Yes
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Release i386 (20111012)
MachineType: Dell Inc. Studio XPS 1645
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.0.0-15-generic-pae root=UUID=f3169916-c711-492a-ab77-4428bcf86ff8 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.0.0-15-generic-pae N/A
 linux-backports-modules-3.0.0-15-generic-pae N/A
 linux-firmware 1.60
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2011
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A13
dmi.board.name: 0VV228
dmi.board.vendor: Dell Inc.
dmi.board.version: A13
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.chassis.version: A13
dmi.modalias: dmi:bvnDellInc.:bvrA13:bd04/01/2011:svnDellInc.:pnStudioXPS1645:pvrA13:rvnDellInc.:rn0VV228:rvrA13:cvnDellInc.:ct8:cvrA13:
dmi.product.name: Studio XPS 1645
dmi.product.version: A13
dmi.sys.vendor: Dell Inc.

Ronan Jouchet (ronj) wrote :
Brad Figg (brad-figg) on 2011-12-15
Changed in linux (Ubuntu):
status: New → Confirmed

Same with Sony Vaio VPCF1390. I'm attaching my report to this one.

Ronan Jouchet (ronj) on 2011-12-15
summary: - Linux 3.0.0-15-generic-pae causes my Dell XPS 1645 to fail to resume
- from suspend
+ Linux 3.0.0-15-generic-pae causes laptops to fail to resume from suspend
+ (Dell XPS 1645, Sony Vaio VPCF1390)
tags: added: regression-proposed

Hi, can you try the resume-trace procedure from https://wiki.ubuntu.com/DebuggingKernelSuspend ? And attach here the resulting dmesg.txt, likely can help tracking down this issue.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Joseph Salisbury (jsalisbury) wrote :

@Ronan,

If you boot back into 3.0.0-14-generic-pae, does this issue go away?

Changed in linux (Ubuntu):
importance: Undecided → Medium
Ronan Jouchet (ronj) wrote :

@Joseph,
Yes:
 - booting into 3.0.0-14-generic-pae results in functional restore
 - booting into 3.0.0-15-generic (non-pae) also results in broken restore

@Herton,
I tried to follow the procedure described but the resulting attached dmesg.txt doesn't contain any "Magic number" string. Tell me if that's OK for you, else please guide me to bring what's missing (we can have an IRC chat)

Changed in linux (Ubuntu):
status: Incomplete → Opinion
status: Opinion → In Progress
status: In Progress → Confirmed
Herton R. Krzesinski (herton) wrote :

@Ronan, ok, can you try to evaluate the debugging steps at https://wiki.ubuntu.com/Kernel/Reference/S3SystemTapDebug ? It requires a bit more of work, but should help isolate this issue. I expect you will need to run locatehang, after reboot from the failed resume. Feel free to join on #ubuntu-kernel on Freenode, asking for directions or if you have doubts etc.

Greg Michalec (greg-primate) wrote :

I can confirm this bug. I tested by booting into recovery mode, remounting drive r/w, and running pm-suspend from console. I tested the -12, -13, -14, and -15 kernels. All resumed correctly, except for the -15 kernel, which merely gives the black screen as described above. I'll try to follow the debugging instructions above.

Greg Michalec (greg-primate) wrote :

Here's the output of running locatehang (per instructions from https://wiki.ubuntu.com/Kernel/Reference/S3SystemTapDebug ).
dameat@lappy686:~/Code/pmdebug/locatehang$ ./locatehang
Looking for function that matches hash from the Magic Number from the kernel log.
  Magic: 0:523:889 maps to hash: d88480
  Hash matches: acpi_disable_wakeup_devices() (address: 0)

Thanks to Herton for helping me get that running.

Ronan Jouchet (ronj) wrote :

@Herton,
I fail to complete step 1.4 of https://wiki.ubuntu.com/Kernel/Reference/S3SystemTapDebug (that is, sudo apt-get install linux-image-$(uname -r)-dbgsym ) because, as confirmed by a quick lookup in synaptic, there is no such package as linux-image-3.0.0-15-generic-pae-dbgsym available. I confirm I did steps 1.1, 1.2, 1.3., 1.4, but the only -dbgsym available is 3.0.0-12, which, I guess, wouldn't help us isolate my issue on 3.0.0-15, right?
What can I do to still provide the information you request?

Greg Michalec (greg-primate) wrote :

@Ronan,
I had the same issue - I ended up downloading the deb directly from here: http://ddebs.ubuntu.com/pool/main/l/linux/ (make sure to get the correct one for your kernel). Also, be warned that the deb is ~600 megs, and unpacks to over 2GB. I had to symlink /usr/lib/debug/lib/ to a directory on my /home partition to have room for it.
It seems unlikely that you'll get different results than I did, but probably worth testing!

NikoC (n-celis) wrote :

Confirmed on a dell E6510 with Kubuntu 11.10 64bit and kernel 3.0.0-15! Suggested solutions didn't work for me, so installed and locked 3.0.0-14 kernel which works fine with suspend and resuming!

Ronan Jouchet (ronj) wrote :

After following Greg advice, I confirm I have the same results:

ronj@blob:~/pmdebug/locatehang$ ./locatehang
Looking for function that matches hash from the Magic Number from the kernel log.
  Magic: 0:523:889 maps to hash: d88480
  Hash matches: acpi_disable_wakeup_devices() (address: 0)

Feel free to ask for more testing. Thanks for working on this bug :)

tags: added: kernel-da-key kernel-key regression-update
Joseph Salisbury (jsalisbury) wrote :

Hello,

I would like to perform a bisect to identify the change that introduced this regression. First I would like to see if the mainline kernel exhibits this bug. If possible, it would be great if folks hitting this bug and test the following two kernels, and report back if the bug exists in either or both of them:

v3.0.12:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.0.12-oneiric/

v3.0.13:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.0.13-oneiric/

Ronan Jouchet (ronj) wrote :

Joseph, I have great news for you (well, I hope :D ) : on my machine,

3.0.12-030012-generic_3.0.12-030012.201111281835_i386: successfully resumes
3.0.12-030012-generic-pae_3.0.12-030012.201111281835_i386: couldn't test, kernel not available in the repo

3.0.13-030013-generic_3.0.13-030013.201112091235_i386: fails to resume
3.0.13-030013-generic-pae_3.0.13-030013.201112091235_i386: fails to resume

Joseph Salisbury (jsalisbury) wrote :

@Ronan

Thanks for testing. That now gives me a starting point to bisect. I'm going to build another test kernel, with a commit half way between v3.0.12 and v3.0.13. I'll post a link to it's location. It would be great if you can test that kernel when it's available.

Thanks again for your help!

Joseph Salisbury (jsalisbury) wrote :

@Ronan

Just one question. The 3.0.0-15-generic-pae kernel is still in the -proposed repository. I just wanted to confirm you are aware your system is configured to update from -proposed?

Ronan Jouchet (ronj) wrote :

@Joseph,

Yup, I'm using -proposed because I'm willing to do this kind of tests :)
I'll test your halfway .12/.13 mainline build once it's ready, waiting for your notice.

Joseph Salisbury (jsalisbury) wrote :

@Ronan,

Thanks again for your help. I have a test kernel available at:
http://people.canonical.com/~jsalisbury/lp904569/

It would be great if you could test and let me know if the issue still happens. Based on the results, I'll build another test kernel halving the commits again. According to the bisect, there should be a max of 7 test kernels required.

Kano (master-kanotix) wrote :

I tested the same kernel and newer ones and had the same problem on one of my systems. Did this bisect:

git bisect start
# good: [ac6766564c0305ca020fe747dfd7dbdf0881369d] Linux 3.0.12
git bisect good ac6766564c0305ca020fe747dfd7dbdf0881369d
# bad: [d986a8dbfd7358bfbda116650c4caf8a3b90d865] Linux 3.0.13
git bisect bad d986a8dbfd7358bfbda116650c4caf8a3b90d865
# good: [08d618b2080d8b3afac6db1a361c54d827b8d044] drm/radeon/kms: add some new pci ids
git bisect good 08d618b2080d8b3afac6db1a361c54d827b8d044
# good: [c060a3d5e9bba4271331b69b9e2c53105999b97f] x86: Fix "Acer Aspire 1" reboot hang
git bisect good c060a3d5e9bba4271331b69b9e2c53105999b97f
# good: [edb9a31845c5ba0ff325daa58f17f881d60d1559] xfs: force buffer writeback before blocking on the ilock in inode reclaim
git bisect good edb9a31845c5ba0ff325daa58f17f881d60d1559
# good: [d80dee54533aa4bfe29def921edb31715fdba214] tick-broadcast: Stop active broadcast device when replacing it
git bisect good d80dee54533aa4bfe29def921edb31715fdba214
# good: [0bbf5c70251286fbc3b7aac5e7961b4568115bfd] oprofile: Fix crash when unloading module (hr timer mode)
git bisect good 0bbf5c70251286fbc3b7aac5e7961b4568115bfd
# bad: [b01b383bbd04e9dcf7d9fe6ca3751b77ccdc533c] clockevents: Set noop handler in clockevents_exchange_device()
git bisect bad b01b383bbd04e9dcf7d9fe6ca3751b77ccdc533c
# good: [4078977c46f627f553ed2d8ea047b9bf25dee48d] clocksource: Fix bug with max_deferment margin calculation
git bisect good 4078977c46f627f553ed2d8ea047b9bf25dee48d

Result:

b01b383bbd04e9dcf7d9fe6ca3751b77ccdc533c is the first bad commit
commit b01b383bbd04e9dcf7d9fe6ca3751b77ccdc533c
Author: Thomas Gleixner <email address hidden>
Date: Fri Dec 2 16:02:45 2011 +0100

    clockevents: Set noop handler in clockevents_exchange_device()

    commit de28f25e8244c7353abed8de0c7792f5f883588c upstream.

    If a device is shutdown, then there might be a pending interrupt,
    which will be processed after we reenable interrupts, which causes the
    original handler to be run. If the old handler is the (broadcast)
    periodic handler the shutdown state might hang the kernel completely.

    Signed-off-by: Thomas Gleixner <email address hidden>
    Signed-off-by: Greg Kroah-Hartman <email address hidden>

:040000 040000 b5dba6238e4accc62febbb9bd67c89c27eeb077b 67fa1fc6a94b92a1fa3cc14018e447a0d387451a M kernel

A revert fixed the issue, also tested with 3.0.14.

Joseph Salisbury (jsalisbury) wrote :

@Ronan

Kano did some great work and found the commit that fixes the issue. I will build a test kernel with the commit reverted.

It would be good to know if this bug exists in the latest upstream kernel. While I build a new test kernel, would it be possible for you to test the latest upstream kernel, which is available at:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-rc6-precise/

summary: - Linux 3.0.0-15-generic-pae causes laptops to fail to resume from suspend
- (Dell XPS 1645, Sony Vaio VPCF1390)
+ Linux 3.0.0-15 causes laptops to fail to resume from suspend (Dell XPS
+ 1645, Sony Vaio VPCF1390)
Joseph Salisbury (jsalisbury) wrote :

@Ronan,

I posted a test kernel for this bug. The test kernel has the commit reverted that may have caused this regression. Can you please test this kernel and report back if it resolves the issue? The kernel is available at:

http://people.canonical.com/~jsalisbury/lp904569/

There is currently only a 32bit kernel there, but I'll have a 64bit kernel uploaded shortly.

If there are any other folks affected by this bug, it would be great if you could also try the test kernel and report back if it resolves your issue.

Ronan Jouchet (ronj) wrote :

@Joseph,
linux-image-3.0.0-15-generic_3.0.0-15.24~lp904569vReverted_i386 resolves the issue. Note I only tested the 32bit version, tell me if you need it tested too.

Joseph Salisbury (jsalisbury) wrote :

@Ronan

That is great news. Thanks again for testing. There are two more tests I'd like to request.

First to test the latest upstream kernels: v3.2-rc6, which can be found at:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-rc6-precise/

I expect the bug to exist there. If it does, I will provide a version of that kernel with the commit reverted, just to confirm in fact that is the cause of this bug.

Thanks again for your help!

Ronan Jouchet (ronj) wrote :

@Joseph
Hello again! Indeed, the i386 and i386-pae versions present at http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-rc6-precise/ both fail to resume.
Waiting for your v3.2-rc6-precise~lp904569vReverted to finalize testing. Cheers :)

Joe Barnett (thejoe) wrote :

confirmed that 3.0.0-15.24~lp904569vRebased fixes the problem on amd64 on my hp envy 15 as well.

Joseph Salisbury (jsalisbury) wrote :

@Ronan

I have the latest mainline kernel building, I'll post it shortly.

Joseph Salisbury (jsalisbury) wrote :

@Ronan

The latest upstream kernel with that commit reverted is available at:
http://people.canonical.com/~jsalisbury/lp904569/mainline/

It would be great if you could test and report back that the suspend/resume issue does not exits.

Thanks again!

Ronan Jouchet (ronj) wrote :

@Joseph
3.2.0-030200rc6.201112231610 does the job! Glad I helped.
You'll do a better job than I explaining the cause and fix of the issue; I let you report the bug to upstream.
Merry Christmas!

Joseph Salisbury (jsalisbury) wrote :

Thanks so much for all your testing, Ronan. And than you Kano for performing all of the bisect work. The work you both done has really helped the progress of this bug!

Eric Hartmann (hartmann-eric) wrote :

@Joseph,

I experienced the same trouble on Sony VAIO VPCF1 and the rollback in 3.2.0-030200rc6-generic fixes the issue for me.

Thanks

Tim Gardner (timg-tpi) on 2012-01-01
Changed in linux (Ubuntu Oneiric):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Tim Gardner (timg-tpi) wrote :

SRU Justification:

Impact: Stable commit "clockevents: Set noop handler in clockevents_exchange_device()" caused a regression and has been reverted upstream.

Patch description: Revert "clockevents: Set noop handler in clockevents_exchange_device()"

Changed in linux (Ubuntu Precise):
status: Confirmed → Fix Released
Changed in linux (Ubuntu Lucid):
status: New → In Progress
assignee: nobody → Herton R. Krzesinski (herton)
Changed in linux (Ubuntu Oneiric):
status: In Progress → Fix Committed
no longer affects: linux (Ubuntu Lucid)
Beanow (beanow) on 2012-01-03
Changed in linux (Ubuntu Oneiric):
status: Fix Committed → Fix Released
Brad Figg (brad-figg) on 2012-01-03
Changed in linux (Ubuntu Oneiric):
status: Fix Released → Fix Committed
tags: removed: kernel-da-key kernel-key
teh603 (darth-giles) wrote :

So for AMD64 on Oineric, which one should I use, the 3.0.15 one (even if I'm running 3.0.14) or the "mainline" one which calls for 3.2 ?

teh603 (darth-giles) wrote :

And, is there any way these patches can be uploaded to a PPA? I can't seem to get them to install in Kubuntu Oineric.

Launchpad Janitor (janitor) wrote :
Download full text (13.9 KiB)

This bug was fixed in the package linux - 3.0.0-15.25

---------------
linux (3.0.0-15.25) oneiric-proposed; urgency=low

  [Brad Figg]

  * Release Tracking Bug
    - LP: #910894

  [ Upstream Kernel Changes ]

  * Revert "clockevents: Set noop handler in clockevents_exchange_device()"
    - LP: #904569

linux (3.0.0-15.24) oneiric-proposed; urgency=low

  [Herton R. Krzesinski]

  * Release Tracking Bug
    - LP: #903188

  [ Alex Bligh ]

  * (config) Change Xen paravirt drivers to be built-in
    - LP: #886521

  [ Chase Douglas ]

  * Revert "SAUCE: HID: hid-ntrig: add support for 1b96:0006 model"
    - LP: #724831
  * Revert "SAUCE: hid: ntrig: Remove unused device ids"
    - LP: #724831

  [ Seth Forshee ]

  * SAUCE: dell-wmi: Demote unknown WMI event message to pr_debug
    - LP: #581312

  [ Upstream Kernel Changes ]

  * Revert "leds: save the delay values after a successful call to
    blink_set()"
    - LP: #893741
  * xfs: Fix possible memory corruption in xfs_readlink, CVE-2011-4077
    - LP: #887298
    - CVE-2011-4077
  * drm/i915: fix IVB cursor support
    - LP: #893222
  * drm/i915: always set FDI composite sync bit
    - LP: #893222
  * jbd/jbd2: validate sb->s_first in journal_get_superblock()
    - LP: #893148
    - CVE-2011-4132
  * ALSA: hda - Don't add elements of other codecs to vmaster slave
    - LP: #893741
  * virtio-pci: fix use after free
    - LP: #893741
  * ASoC: Don't use wm8994->control_data in wm8994_readable_register()
    - LP: #893741
  * sh: Fix cached/uncaced address calculation in 29bit mode
    - LP: #893741
  * drm/i915: Fix object refcount leak on mmappable size limit error path.
    - LP: #893741
  * drm/nouveau: initialize chan->fence.lock before use
    - LP: #893741
  * drm/radeon/kms: make an aux failure debug only
    - LP: #893741
  * ALSA: usb-audio - Check the dB-range validity in the later read, too
    - LP: #893741
  * ALSA: usb-audio - Fix the missing volume quirks at delayed init
    - LP: #893741
  * KEYS: Fix a NULL pointer deref in the user-defined key type
    - LP: #893741
  * hfs: add sanity check for file name length
    - LP: #893741
  * drm/radeon: add some missing FireMV pci ids
    - LP: #893741
  * sfi: table irq 0xFF means 'no interrupt'
    - LP: #893741
  * x86, mrst: use a temporary variable for SFI irq
    - LP: #893741
  * b43: refuse to load unsupported firmware
    - LP: #893741
  * md/raid5: abort any pending parity operations when array fails.
    - LP: #893741
  * mfd: Fix twl4030 dependencies for audio codec
    - LP: #893741
  * xen:pvhvm: enable PVHVM VCPU placement when using more than 32 CPUs.
    - LP: #893741
  * xen-gntalloc: integer overflow in gntalloc_ioctl_alloc()
    - LP: #893741
  * xen-gntalloc: signedness bug in add_grefs()
    - LP: #893741
  * powerpc/ps3: Fix lost SMP IPIs
    - LP: #893741
  * powerpc: Copy down exception vectors after feature fixups
    - LP: #893741
  * backing-dev: ensure wakeup_timer is deleted
    - LP: #893741
  * block: Always check length of all iov entries in blk_rq_map_user_iov()
    - LP: #893741
  * Linux 3.0.10
    - LP: #893741
  * drm/i915: add multi-threaded forcewake support
    - LP: #891270
  * (pre-sta...

Changed in linux (Ubuntu Oneiric):
status: Fix Committed → Fix Released
To post a comment you must log in.