Bluetooth modules cause random S4 hangs

Bug #715643 reported by Keng-Yu Lin
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
pm-utils (Debian)
Fix Released
Unknown
pm-utils (Ubuntu)
Invalid
Medium
Keng-Yu Lin
Maverick
Fix Released
Medium
Keng-Yu Lin

Bug Description

Binary package hint: pm-utils

It is found that the bluetooth modules with USB VID/PIDs [0a5c:219c] and [0a5c:21bc] is causing random S4 hangs on various laptops with SandyBridge chipset.

The hangs can be observed by repeating hibernate/resume in a 250 iterations with fwts. The successful cycles number before the hang is usually ranged in tens, though it varies on different types of laptops.

A major soak testing was performed across several SandyBridge laptops installed with Ubuntu Maverick.

The hang can be worked around to stop the bluetooth service before the hibernate and restart it in the resume in the testing.

This bug is an attempt to make the workaround look better in the shape and more suitable for the SRU in Maverick.

The plan is to add the quirk support for bluetooth (with USB IDs as above), which is currently lacking in pm-utils.

Natty is not yet soak-tested. However, if the bug is also found present there later, the fix shall also go there or upstream.

Revision history for this message
Keng-Yu Lin (lexical) wrote :

`lsusb` of the problematic bluetooth device.

Changed in pm-utils (Ubuntu):
assignee: nobody → Keng-Yü Lin (lexical)
status: New → In Progress
importance: Undecided → Medium
Revision history for this message
Kent Baxley (kentb) wrote :

Just to be clear, we have two usb bluetooth devices that we know are problematic and would benefit from the workaround:

Bus 002 Device 003: ID 413c:8187 Dell Computer Corp. DW375 Bluetooth
Module

and

DW1701 with the usb ID's mentoned in the Bug Description.

Chris Van Hoof (vanhoof)
description: updated
description: updated
tags: added: hwe-blocker
Revision history for this message
Kent Baxley (kentb) wrote :

Hi Keng-Yu,

Any updates on this? Thanks!

Revision history for this message
Keng-Yu Lin (lexical) wrote :
Revision history for this message
Keng-Yu Lin (lexical) wrote :

The debdiff contains two quilt patches.

95-bluetooth-quirk.patch added the "--bluetooth-service-off" argument for stop/restart the bluetooth service.

96-bluetooth-known-quirks.patch added a simple mechanism for quirking the bluetooth USB pid/vid (currently [0a5c:219c] [0a5c:21bc] [413c:8187]).

The quirking mechanism in 96-bluetooth-known-quirks.patch is simplified and I have no intent to upstream it. It is only for this SRU.

IMO, a better way is to use the same quirking mechanism as video-quirk, and also make video & bluetooth to share the same quirking function. But the change will be too big to fit an SRU.

However, 95-bluetooth-quirk.patch should be in the good shape for upstreaming.

Revision history for this message
Keng-Yu Lin (lexical) wrote :

looking for package update sponsoring.

Revision history for this message
Keng-Yu Lin (lexical) wrote :

s/update/upload/

Changed in pm-utils (Ubuntu Maverick):
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Martin Pitt (pitti) wrote :

A similar report was recently filed and fixed in bug 698331 in a generic fashion; the attached patch looks like it does something similar in a more complicated. Any chance you could try whether the natty package fixes this? It will work fine on maverick. If it does, I'd rather SRU the natty/lucid fix into maverick.

Changed in pm-utils (Ubuntu):
status: In Progress → Incomplete
Keng-Yu Lin (lexical)
Changed in pm-utils (Ubuntu Maverick):
assignee: nobody → Keng-Yü Lin (lexical)
Revision history for this message
Keng-Yu Lin (lexical) wrote :

I checked 13-49bluetooth-sync.patch in the Natty pm-utils source package and bug 698331. I think it is not the same as this bug.

Bug 698331 and 49bluetooth are specific to Thinkpads. This bug deals with hardwares other than Thinkpads.

49bluetooth checks the presence of a proc node generated by the thinkpad-acpi kernel module or the script just exits in the first line.

And the fix in Bug 698331 polls /sys/module/btusb/refcnt. This is only relevant when SUSPEND_MODULES with btusb is used.

Here it was found (by many experiments) that stopping the bluetoothd alone, without unloading/loading the btusb kernel module can work around the hibernate hangs.

Revision history for this message
Keng-Yu Lin (lexical) wrote :

I also tested pm-utils from Natty (1.4.1-5). As mentioned in the last comment, 49bluetooth is not actually executed because I was not testing on a Thinkpad.

I removed the thinkpad-related part, left only 13-49bluetooth-sync.patch, and added "/sys/module/btusb/refcnt". The value does not reach 0. However this should make the CPU waits a while before the time out. But the hang is still observed in my testing.

(A gentle notice is that the goal is 250 successful S4 iterations.)

Revision history for this message
Keng-Yu Lin (lexical) wrote :

In the comment above, I mean I added "echo /sys/module/btusb/refcnt" in 49bluetooth.

Changed in pm-utils (Ubuntu):
status: Incomplete → Triaged
assignee: Keng-Yü Lin (lexical) → Martin Pitt (pitti)
Changed in pm-utils (Ubuntu Maverick):
status: In Progress → Triaged
Revision history for this message
Martin Pitt (pitti) wrote :

I originally misread the original patch, as it renames 49bluetooth to 49bluetooth-ibm, and I misread that as new code. Any chance you could clean this up (revert the renaming), merge the two patches into one, and open an upstream bug with some details and the patch? Then I can sponsor this to Natty, and once it got a little testing we can also SRU it.

Thanks!

martin

Keng-Yu Lin (lexical)
Changed in pm-utils (Ubuntu):
assignee: Martin Pitt (pitti) → Keng-Yü Lin (lexical)
status: Triaged → In Progress
Revision history for this message
Keng-Yu Lin (lexical) wrote :

@Martin

  I reported it as http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=614902.

  I reverted the renaming of 49bluetooth. Also renamed my 45bluetooth to 45bluetooth-service. I think this makes it easier to identify.

  New patch is in BTS. I also attached here.

Changed in pm-utils (Ubuntu):
status: In Progress → Triaged
assignee: Keng-Yü Lin (lexical) → Martin Pitt (pitti)
Changed in pm-utils (Ubuntu Maverick):
assignee: Keng-Yü Lin (lexical) → Martin Pitt (pitti)
Changed in pm-utils (Debian):
status: Unknown → New
Revision history for this message
Martin Pitt (pitti) wrote :

Thanks Lin, that already looks a lot better. It still needs some improvement, I sent a suggested piece of code to the Debian bug.

Revision history for this message
Keng-Yu Lin (lexical) wrote :

Thanks for contributing the match code utilising udevadm. I attached the modified patch here and in BTS.

Revision history for this message
Martin Pitt (pitti) wrote :

Lin,

thanks! This looks good enough for maverick.

Question is now what to do about natty onwards. Is this also a problem on current natty? It should be possible to install the natty kernle on current maverick to test this. If it still fails, this needs to be reported against linux, as this pm-utils hook is just a workaround hack, not a proper long-term solution. If it works with the natty kernel, we'll apply the pm-utils fix to maverick only and invalidate the natty task. If it's broken with natty kernel as well, I'll talk with Michael Biebl to apply it to sid/natty, but this requires a kernel bug report.

Thanks,

Martin

Revision history for this message
Martin Pitt (pitti) wrote :

Sponsored. Assigning back to you for the natty investigation.

Changed in pm-utils (Ubuntu):
assignee: Martin Pitt (pitti) → Keng-Yü Lin (lexical)
Changed in pm-utils (Ubuntu Maverick):
assignee: Martin Pitt (pitti) → Keng-Yü Lin (lexical)
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Accepted pm-utils into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in pm-utils (Ubuntu Maverick):
status: Triaged → Fix Committed
tags: added: verification-needed
Revision history for this message
Martin Pitt (pitti) wrote :

(Note that this can go to -proposed now, but not to -updates until we figured out the natty solution, as per SRU policy).

Revision history for this message
Kent Baxley (kentb) wrote :

Tested pm-utils with a laptop containing a DW375 Bluetooth device. I ran a small, quick test of 20 consecutive S4 cycles with a 30 second delay between S4 iterations.

I was able to successfully complete all 20 without a hang in S4.

Without this 'fix', or some other workaround that disabled / enabled bluetooth during hibernate / thaw, the laptop would hang randomly and it rarely made it past 10 S4 cycles before hanging. Thanks, guys!

Revision history for this message
Martin Pitt (pitti) wrote :

Thanks for testing! Now we need someone to do the same test procedure under maverick with the natty kernel, but the non-proposed pm-utils.

tags: added: verification-done
removed: verification-needed
Keng-Yu Lin (lexical)
Changed in pm-utils (Ubuntu):
status: Triaged → In Progress
Revision history for this message
Keng-Yu Lin (lexical) wrote :

I've tried Natty kernel package linux-image-2.6.38-5-generic version 2.6.38-5.32. I am facing reboot in the resume from hibernation. It usually happens just after 1 or 2 successful S4.

Since this is not like the hang we have in Maverick kernel. I also tested the mainline build, package linux-image-2.6.38-020638rc7-generic version 2.6.38-020638rc7.201103020909.

The hang can be observed after several S4 (below 5 times). The reboot in resume is not observed.

Since my hardware is really crappy (PT and was disassembled for many times to replace the BT module for testing), Kent can you also perform what Martin requested in #21?

Revision history for this message
Kent Baxley (kentb) wrote :

@Keng-Yu,

Was your machine a Sugar Bay system by any chance? If it is, then there are other issues with S4 that haven't been ironed out yet (even in Natty). We're working closely with Intel to get those issues fixed, and the symptoms you described looked similar. I'm not sure if those reboots / hangs are related to bluetooth (especially the reboot).

I will, however, make a note to test a sugar bay system using Maverick and a Natty kernel with no pm-utils fixes and let you guys know what I run into.

Revision history for this message
Martin Pitt (pitti) wrote : Re: [Bug 715643] Re: Bluetooth modules cause random S4 hangs

Hello Lin,

Keng-Yü Lin [2011-03-03 5:36 -0000]:
> I've tried Natty kernel package linux-image-2.6.38-5-generic version
> 2.6.38-5.32. I am facing reboot in the resume from hibernation. It
> usually happens just after 1 or 2 successful S4.

Thanks for testing. Does the pm-utils bluetooth workaround also fix
this with the natty kernel?

--
Martin Pitt | http://www.piware.de
Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org)

Revision history for this message
Kent Baxley (kentb) wrote :

With linux-image-2.6.38-5-generic and NO pm-utils from -proposed, I'm seeing the same odd behavior as Keng-Yu in that I'm seeing a weird reboot in the resume from hibernate after 1 or 2 successful S4s. On my third try, I managed to get 13 out of 30 S4s to complete before seeing the reboot again.

At no time, however, with the Natty kernel, did I see the "btusb" messages on the console while entering hibernate, which usually precludes a hang during S4 somewhere down the road.

I will now try running a few S4 runs again with the pm-utils from -proposed along with the Natty kernel to see if it makes a difference.

Revision history for this message
Kent Baxley (kentb) wrote :

With the pm-utils from -proposed that contains the bluetooth workarounds and the 2.6.38-5 Natty kernel, I got 27 successful S4's out of 30 before the mysterious reboot occurred.

Revision history for this message
Keng-Yu Lin (lexical) wrote :

@Kent
  Thanks for the testing. I am testing on a SandyBridge-based system.

@Martin
  For the testing result from Kent and me, I think we are seeing different types of hang in 2.6.38 kernel and this prevents further verification on the pm-utils workaround.

Changed in pm-utils (Ubuntu):
status: In Progress → New
Revision history for this message
Martin Pitt (pitti) wrote :

Thanks. So I propose we only apply that workaround for maverick, and not carry it forward to natty?

Changed in pm-utils (Ubuntu):
status: New → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pm-utils - 1.4.1-3ubuntu1

---------------
pm-utils (1.4.1-3ubuntu1) maverick-proposed; urgency=low

  * Add 12-bluetooth-service-restart.patch: Add the quirk support for
    stoping/restarting bluetooth service. It is found that the bluetooth
    modules with USB VID/PIDs [0a5c:219c] [0a5c:21bc] and [413c:8187] are
    causing random S4 hangs on various laptops with SandyBridge chipset.
    (LP: #715643, Debian #614902)
 -- Keng-Yu Lin <email address hidden> Fri, 25 Feb 2011 15:59:43 +0800

Changed in pm-utils (Ubuntu Maverick):
status: Fix Committed → Fix Released
Changed in pm-utils (Debian):
status: New → Fix Released
Revision history for this message
Keng-Yu Lin (lexical) wrote :

@Martin
  That makes sense. Thanks for the help.

Revision history for this message
RPHegde (rphegde) wrote :

 Causes random hangs and freezes during suspend.

I got this package pushed (I did not have proposed enabled, not sure how I got this package on march 11)
Since then, computer would lock up randomly during suspend. Reverted to old package, I am fine now.

system 76 pangolin 6 laptop - 64 bit

Revision history for this message
Keng-Yu Lin (lexical) wrote :

@RPHedge,

  Can you post the USB VID/PID of the laptop?

Revision history for this message
Keng-Yu Lin (lexical) wrote :

@RPHedge,

  I mean the USB VID/PID of the bluetooth adaptor on your laptop.

Revision history for this message
RPHegde (rphegde) wrote :

from lsusb..

Bus 007 Device 002: ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.