ehci-hcd causes failed suspends on kernel 2.6.24

Bug #212660 reported by vlowther
36
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Tim Gardner
Jaunty
Fix Released
Undecided
Tim Gardner

Bug Description

After upgrading to Hardy release, suspend/resume works once, then the system fails to suspend until it is rebooted. Debugging shows that g-p-m, hal, and pm-utils are doing the right thing, and dmesg is filled wiith process information (failing dmesg attached).

System is a Dell Latitude D820.

If I reboot back into 2.6.24-12 using the same userspace (Hardy release in this case) suspend and resume works normally.

Revision history for this message
vlowther (victor-lowther) wrote :
Revision history for this message
vlowther (victor-lowther) wrote :
Revision history for this message
vlowther (victor-lowther) wrote :
Revision history for this message
vlowther (victor-lowther) wrote :
Revision history for this message
vlowther (victor-lowther) wrote :

2.6.14-12 functions normally.

Revision history for this message
vlowther (victor-lowther) wrote :

er, make that 2.6.24-12.

Revision history for this message
Michael Losonsky (michl) wrote :

Can confirm this on a Dell C610. Was excited that suspend/resume
worked in Hardy, both the lid and the suspend button, but the
latest kernel update broke it. I can suspend, but not resume.
Closing and opening the lid twice (which was a workaround in
Gutsy) also did not work. Had to force the computer off and
reboot.

Changed in linux:
status: New → Confirmed
Revision history for this message
vlowther (victor-lowther) wrote :

Same issue also happens with 2.6.24-16

Revision history for this message
vlowther (victor-lowther) wrote : Re: kernel 2.6.24-16 fails suspending
description: updated
Revision history for this message
vlowther (victor-lowther) wrote :

Added kernel team to get some visibility. What do you need for debugging?

Changed in linux:
assignee: nobody → canonical-kernel-team
Revision history for this message
vlowther (victor-lowther) wrote :

suprise! also fails on 2.6.24-17!

Revision history for this message
vlowther (victor-lowther) wrote :

git-bisect on the ubuntu-hardy git repo shows that the commit that breaks things is 978a8bed296d7f5d76c -- adding a separate IAA watchdog timer.

Configured pm-utils to remove the ehci-hcd module, suspend/resume started to function normally on 2.6.24-17.

Revision history for this message
vlowther (victor-lowther) wrote :
Revision history for this message
Justin Dugger (jldugger) wrote :

I don't think Canonical Kernel Team is the appropriate assignment here

Changed in linux:
assignee: canonical-kernel-team → kernel-team
Changed in linux:
assignee: kernel-team → timg-tpi
Changed in dell:
assignee: nobody → timg-tpi
Revision history for this message
Tim Gardner (timg-tpi) wrote :

The simplest work around for ehci-hcd issues for the time being is to cause it to be removed during suspend, e.g.,

echo SUSPEND_MODULES="ehci-hcd" > /tmp/unload_modules
chmod +x /tmp/unload_modules
sudo mv /tmp/unload_modules /etc/pm/config.d/unload_modules

The commit that vlowther isolated in https://bugs.edge.launchpad.net/dell/+bug/212660/comments/12 is sufficiently complicated that I'm not willing to mess with it as an SRU until someone can isolate the specific code in the patch that causes the suspend regression.

Revision history for this message
vlowther (victor-lowther) wrote :

Based on an experiment I ran before I started the git-bisect and found the triggering commit, I would have to suspect the interaction between the new IAA watchdog timer handling in ehci-hcd and the way khubd interacts with the freezer (manually disabling processor 1 after the first failed suspend on my box caused the sched-debug output in include khubd as a runnable process along with pm-suspend).

But that is just a guess, and I don't have enough knowledge in that area to test it effectivly.

Revision history for this message
Stefan Bader (smb) wrote :

The patch introduced a new watchdog for handling missing interrupts. Which can cause problems if it does not get stopped before a suspend (for example). I found two commits in the upstream sources that potentially fix the behaviour for this. Maybe you could find out whether/which one (or both) helps in this case:

1. cdc647a9b75741659bfc6acc44a6b3a646ad53bf (USB: another ehci_iaa watchdog fix)
2. 21da84a89312dd8d014ca3352d1ab5c2279ec548 (USB: ehci shutdown refactored)

Best try 1. first since this is the less intrusive change, which would have better chances to get accepted for SRU.

Revision history for this message
Mario Limonciello (superm1) wrote :

Stefan:

I just attempted locally on my system with the first commit that you mentioned: cdc647a9b75741659bfc6acc44a6b3a646ad53bf
and it appears to resolve the direct issue, but introduces another.

The commit doesn't apply cleanly as is, but is small enough that it's easy enough to just see it's line numberings that got thrown off. After the suspend, the mouse is really jumpy. It seems like a lot of interrupts must be fired off the USB bus or something. Removing the high speed device from the bus and plugging it back in resolves the issue.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

Has anyone tried the -19 Hardy kernel? You'll have to be subscribed to -updates. There was a fairly serious regression that was found with -16, but didn't get fixed until -19.

Revision history for this message
Yann (lostec) wrote :

@ Tim:

With -19, suspend+resume seems to work fine (except sound always not recovering with my toshiba p100, but this is the case since dapper+powersaved so...) but only at first try.

A second suspend/resume cycle does not resume. Nothing interesting in pm-suspend.log because disk sync is done far from suspend end and no log is written after that.

It was working with -16.

Changed in dell:
status: New → Confirmed
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Mario Limonciello (superm1) wrote :

This appears to be resolved in 2.6.26+

Changed in linux:
status: Confirmed → Fix Released
Changed in dell:
status: Confirmed → Fix Released
Revision history for this message
Chow Loong Jin (hyperair) wrote :

This affects 2.6.27-7-generic on Intrepid. I'm currently using this: SUSPEND_MODULES="ehci_hcd uhci_hcd usbcore" in /etc/pm/config.d/usb_suspend_workaround.

If I comment that line out, suspend works, but resume fails. /var/log/pm-suspend.log shows no sign of resuming.

Changed in linux:
status: Fix Released → New
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Jaunty):
status: New → Fix Released
Revision history for this message
Felipe Figueiredo (philsf) wrote :

I confirm that this issue no longer affects me in up to date Jaunty, without the config in /etc/pm/config.pm.

My laptop is a Lenovo 3000 v100, and hardware details are attached to duplicate bug #243967. Thanks for the fix!

Changed in somerville:
assignee: nobody → Tim Gardner (timg-tpi)
status: New → Fix Released
no longer affects: dell
Revision history for this message
Timothy R. Chavez (timrchavez) wrote :

The bug task for the somerville project has been removed by an automated script. This bug has been cloned on that project and is available here: https://bugs.launchpad.net/bugs/1305671

no longer affects: somerville
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.