Ubuntu

Toshiba Tecra A11 - System doesn't resume after suspend

Reported by Daniel Manrique on 2011-02-15
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Maverick
Medium
Unassigned
Natty
High
Unassigned

Bug Description

On this system, with the 2.6.38-3.30 kernel as installed by default with the Natty image from 20110214, upon attempting to resume after suspend, the system is unresponsive (black screen, no keyboard response, no network/ping from other systems). Poweroff is the only option available.

I tried the other, older available kernel (2.6.38-2.29), with this kernel suspend works fine and without issues.

Steps to reproduce:
1- Install Natty (20110214) on the HP ProBook 6550b
2- Upon bootup, select "suspend" from the panel
3- Wait until power light is blinking indicating suspend mode, then press power button

Expected result:
- System resumes correctly and continues working

Actual result:
- As indicated above

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: linux-image-2.6.38-3-generic 2.6.38-3.30
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.38-3.30-generic 2.6.38-rc4
Uname: Linux 2.6.38-3-generic i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: i386
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC268 Analog [ALC268 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 1275 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xd0820000 irq 43'
   Mixer name : 'Intel IbexPeak HDMI'
   Components : 'HDA:10ec0268,11790616,00100003 HDA:11c11040,11790001,00100200 HDA:80862804,11790001,00100000'
   Controls : 17
   Simple ctrls : 9
Date: Tue Feb 15 16:33:07 2011
HibernationDevice: RESUME=UUID=9f243e3e-5f07-4ba9-92d0-4dfb46bcaf88
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Alpha i386 (20110214)
MachineType: TOSHIBA TECRA A11
ProcEnviron:
 LANGUAGE=en_US:en
 LANG=en_US
 LC_MESSAGES=en_US.utf8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.38-3-generic root=UUID=02b1f8a0-defe-4533-83db-1deded7850c4 ro rootdelay=60 quiet splash initcall_debug vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-2.6.38-3-generic N/A
 linux-backports-modules-2.6.38-3-generic N/A
 linux-firmware 1.47
SourcePackage: linux
dmi.bios.date: 12/13/2009
dmi.bios.vendor: TOSHIBA
dmi.bios.version: Version 1.40
dmi.board.asset.tag: 0000000000
dmi.board.name: Portable PC
dmi.board.vendor: TOSHIBA
dmi.board.version: Version A0
dmi.chassis.asset.tag: 0000000000
dmi.chassis.type: 10
dmi.chassis.vendor: TOSHIBA
dmi.chassis.version: Version 1.0
dmi.modalias: dmi:bvnTOSHIBA:bvrVersion1.40:bd12/13/2009:svnTOSHIBA:pnTECRAA11:pvrPTSE0C-00N00N:rvnTOSHIBA:rnPortablePC:rvrVersionA0:cvnTOSHIBA:ct10:cvrVersion1.0:
dmi.product.name: TECRA A11
dmi.product.version: PTSE0C-00N00N
dmi.sys.vendor: TOSHIBA

Daniel Manrique (roadmr) wrote :
Daniel Manrique (roadmr) wrote :
Download full text (3.5 KiB)

Tried this mainline kernel:
http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2011-02-14-natty/linux-image-2.6.38-999-generic_2.6.38-999.201102140905_i386.deb

WIth this, when asking the system to suspend, the screen switches modes for a moment, then comes back as it was, the system doesn't suspend, and this appears in kern.log:

Feb 15 16:46:29 ubuntu kernel: [ 272.025261] PM: Syncing filesystems ... done.
Feb 15 16:46:29 ubuntu kernel: [ 272.026524] PM: Preparing system for mem sleep
Feb 15 16:46:31 ubuntu kernel: [ 272.219749] Freezing user space processes ... (elapsed 0.01 seconds) done.
Feb 15 16:46:31 ubuntu kernel: [ 272.235567] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
Feb 15 16:46:31 ubuntu kernel: [ 272.251542] PM: Entering mem sleep
Feb 15 16:46:31 ubuntu kernel: [ 272.251636] Suspending console(s) (use no_console_suspend to debug)
Feb 15 16:46:31 ubuntu kernel: [ 272.251991] sd 0:0:0:0: [sda] Synchronizing SCSI cache
Feb 15 16:46:31 ubuntu kernel: [ 272.252261] sd 0:0:0:0: [sda] Stopping disk
Feb 15 16:46:31 ubuntu kernel: [ 272.272553] serial 00:09: disabled
Feb 15 16:46:31 ubuntu kernel: [ 272.319364] tpm_tis 00:03: Operation Timed out
Feb 15 16:46:31 ubuntu kernel: [ 272.319378] legacy_suspend(): pnp_bus_suspend+0x0/0x70 returns -62
Feb 15 16:46:31 ubuntu kernel: [ 272.319383] PM: Device 00:03 failed to suspend: error -62
Feb 15 16:46:31 ubuntu kernel: [ 272.687849] PM: Some devices failed to suspend
Feb 15 16:46:31 ubuntu kernel: [ 272.687922] hub 1-0:1.0: activate --> -22
Feb 15 16:46:31 ubuntu kernel: [ 272.687957] hub 2-0:1.0: activate --> -22
Feb 15 16:46:31 ubuntu kernel: [ 272.687997] sd 0:0:0:0: [sda] Starting disk
Feb 15 16:46:31 ubuntu kernel: [ 272.688484] serial 00:09: activated
Feb 15 16:46:31 ubuntu kernel: [ 272.688832] hub 1-1:1.0: activate --> -22
Feb 15 16:46:31 ubuntu kernel: [ 273.363784] PM: resume of devices complete after 677.095 msecs
Feb 15 16:46:31 ubuntu kernel: [ 273.363875] PM: resume devices took 0.676 seconds
Feb 15 16:46:31 ubuntu kernel: [ 273.363895] PM: Finishing wakeup.
Feb 15 16:46:31 ubuntu kernel: [ 273.363896] Restarting tasks ... done.
Feb 15 16:46:31 ubuntu kernel: [ 273.364781] video LNXVIDEO:01: Restoring backlight state
Feb 15 16:46:31 ubuntu kernel: [ 273.489613] e1000e 0000:00:19.0: irq 40 for MSI/MSI-X
Feb 15 16:46:31 ubuntu kernel: [ 273.545330] e1000e 0000:00:19.0: irq 40 for MSI/MSI-X
Feb 15 16:46:31 ubuntu kernel: [ 273.545582] ADDRCONF(NETDEV_UP): eth0: link is not ready
Feb 15 16:46:31 ubuntu kernel: [ 273.578363] ADDRCONF(NETDEV_UP): wlan0: link is not ready
Feb 15 16:46:32 ubuntu kernel: [ 274.159126] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,commit=0
Feb 15 16:46:34 ubuntu kernel: [ 276.142784] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Feb 15 16:46:34 ubuntu kernel: [ 276.143107] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Feb 15 16:46:39 ubuntu kernel: [ 281.851438] wlan0: authenticate with 00:22:90:50:1a:50 (try 1)
Feb 15 16:46:39 ubuntu kernel: [ 281.853120] wlan0: authenticated
Feb 15 16:46:39 ubuntu kernel: [ 281.853149] wlan0: associate with 00:22:90:50:1a:50 (tr...

Read more...

Changed in linux (Ubuntu):
status: New → Triaged
Herton R. Krzesinski (herton) wrote :

On mainline kernel, what made suspend fail is tpm/tpm_tis, extracting from the log:

[ 272.319364] tpm_tis 00:03: Operation Timed out
[ 272.319378] legacy_suspend(): pnp_bus_suspend+0x0/0x70 returns -62
[ 272.319383] PM: Device 00:03 failed to suspend: error -62
[ 272.687849] PM: Some devices failed to suspend

At first it looks like commit 9b29050 ("tpm_tis: Use timeouts returned from TPM") is what should prevent it, but this mainline build of Feb 14 should already have this fix, so probably this is a new problem.

Daniel Manrique (roadmr) wrote :

Herton,

Thanks for looking into this. Just to be absolutely sure, I tried today's mainline kernel:

http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2011-02-16-natty/linux-image-2.6.38-999-generic_2.6.38-999.201102161129_i386.deb

I have the exact same behavior when trying to sleep (tpm_tis operation timed out). So it appears to be something other than what was fixed in that commit you mention.

Herton R. Krzesinski (herton) wrote :

Daniel, in fact accordingly to the thread at https://lkml.org/lkml/2011/2/17/218 that same commit I said (9b29050) probably introduced the regression for you too. So reverting the commit could work for your case too. I'll keep following up on that.

Herton R. Krzesinski (herton) wrote :

I provided a test kernel here: http://people.canonical.com/~herton/lp719620/

It contains the commit reverted, please install and test it to confirm it's indeed the case.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
assignee: nobody → Herton R. Krzesinski (herton)
Daniel Manrique (roadmr) wrote :

Hi Herton,

Thanks for the updated kernel. This is the kernel's version string:

2.6.38-4-generic (herton@tangerine) #31+1lp719620 SMP Fri Feb 18 14:56:43 UTC 2011

I tested this kernel on the Tecra A11. The system boots up correctly, I am able to enter suspend mode, but when trying to resume, the screen is blank (no backlight even). The system is "alive", the keyboard responds, I can ssh in from another system so the network is also active, but the display shows nothing, I haven't found a way to re-enable it. So it's better than 2.6.38-3 and the mainline kernel from 20110216 but still doesn't work correctly like 2.6.38-2 did.

Interestingly, I asked the "blind" system to hibernate, which it did correctly, and upon restoring from hibernation the display came back up just fine. Hope this bit of information is helpful.

Changed in linux (Ubuntu):
status: Incomplete → Triaged
Herton R. Krzesinski (herton) wrote :

Ok, thanks for the testing.

So we have to problems here:
- tpm_tis breaking suspend.
- when suspend works (tpm_tis problem out of the way), video is blank on resume.

I suspect the video problem may be fixed by a commit Andy (apw) cherry-picked today. To see if it's the case, please test the kernel I uploaded at the same previous location:
http://people.canonical.com/~herton/lp719620/

This new kernel has the tpm_tis reverted and possible fix for the video issue.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Daniel Manrique (roadmr) wrote :

Herton,

No luck with the new (2.6.38-5) kernel. I have the exact same behavior as with -4 (video is blank on resume). Keyboard responds, network is up, I can even blindly hibernate and the system comes back up fine. But no video after suspend.

Linux ubuntu 2.6.38-5-generic #32~1lp719620 SMP Fri Feb 18 20:10:23 UTC 2011 i686 i686 i386 GNU/Linux

Changed in linux (Ubuntu):
status: Incomplete → Triaged
Herton R. Krzesinski (herton) wrote :

We still have then the video blank on resume issue. I cherry-picked some more fixes from drm-intel, and built a new test kernel, can you install it and check? (it's on same url as before)

It's a shot in the dark, but the fixes may be can help here. Also can you get dmesg over network (eg. ssh into it) from the system after it resumes and attach here? May be there is something in it which can point at what's happening.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Daniel Manrique (roadmr) wrote :

Hi Herton,

Tried the latest kernel you uploaded, and I'm seeing the same behavior (blank video on resume).

I tried the following: with the system in that state, I asked it to suspend (not hibernate). On resuming from the *second* suspend, the backlight turns on fine and the system works normally. I went back to the -4 kernel we tested the other day, and I confirmed it has this behavior too, i.e. the first suspend attempt fails to restore screen backlight upon resuming, but the *second* attempt succeeds. This information will hopefully be helpful. This however is behavior that will baffle users, so I don't think we can even call this a valid workaround, the problem still remains.

I'm attaching a dmesg file containing the two suspend/resume series of events. Also if there's any other procedure or information that might be helpful with this problem, please don't hesitate to ask; this system is used mainly for testing Ubuntu.

Herton R. Krzesinski (herton) wrote :

The dmesg doesn't have anything unusual so far. As another thing to try, I pulled changes from drm-intel-next and built/uploaded on same url a new kernel, please test that. May be a fix is already there.

If the situation doesn't change, I'm out of new things to try. In this case if you can, something to try would be a git bisect between the working kernel (2.6.38-rc3) and failing kernel (2.6.38-rc4). If you don't know how to do, I can provide testing kernels for each bisect step, you will need to test ~3 kernels, by the commit count on i915 between rc3/rc4.

Daniel Manrique (roadmr) wrote :

Hi Herton,

I tested the latest kernel, uname -a is as follows:

Linux ubuntu 2.6.38-5-generic #32~3lp719620 SMP Mon Feb 21 20:24:25 UTC 2011 i686 i686 i386 GNU/Linux

Sadly this has the exact same behavior where the backlight does not come on on the first restore attempt.

If you have the time I'd appreciate your help with the git bisect process, I've never done it and I believe with your setup it might go a bit faster. I'm ready to test and report on the required kernels.

Thanks for all your help, hopefully we'll figure this out soon.

Herton R. Krzesinski (herton) wrote :

Please try first bisect step and report if it works or not, 2.6.38-2.29+lp719620b1 kernel at http://people.canonical.com/~herton/lp719620/

Daniel Manrique (roadmr) wrote :

Hi Herton,

Thanks for your help with the kernels, I know it's a time consuming process and really appreciate you taking the time to help with this issue.

I tried the first one:

Linux ubuntu 2.6.38-2-generic #29+lp719620b1 SMP Wed Feb 23 15:15:05 UTC 2011 i686 i686 i386 GNU/Linux

It's to be marked as bad, it exhibits the no-display-after-first-resume behavior. At least it *does* enter the suspend state correctly.

I'm ready for the next one.

Herton R. Krzesinski (herton) wrote :

Ok, I uploaded the next step, please that (2.6.38-2.29+lp719620b2)

Daniel Manrique (roadmr) wrote :

Hi Herton,

step 2 is bad too, suspends correctly but on resume backlight is off. As usual, a second suspend/resume set of events brings things back to normal.

Linux ubuntu 2.6.38-2-generic #29+lp719620b2 SMP Wed Feb 23 20:18:25 UTC 2011 i686 i686 i386 GNU/Linux

Thanks again, I'm ready to test the next kernel.

Herton R. Krzesinski (herton) wrote :

Hmm strange, this being bad, the commit which is reported to be broken is "drm/i915/sdvo: If at first we don't succeed in reading the response, wait", which doesn't make much sense... also I looked and before this until 2.6.38-2.29 there are no drm/acpi changes which could explain this.

Well, to confirm please try new kernel I uploaded now, 2.6.38-2.29+lp719620b3

Daniel Manrique (roadmr) wrote :

Hi Herton,

It took me a while to report back. I checked the latest kernel you sent (b3) only to find that it's bad as well (no video after resume). Since you mentioned it seemed odd, I went back and retested our original known-good kernel (2.6.38-2) and found, to my dismay, that it is also bad; this means my original testing was flawed. I apologize for this, I'm really sorry I made this mistake.

Since we installed Lucid on the A11 for some testing, I checked the kernel included and found that one to work correctly (i.e. everything comes back fine after suspend/resume). That being a 2.6.32 kernel, it seemed quite far back in time, but at least it's a known good starting point.

I went back to the mainline kernel archive and did some poor man's bisecting with the compiled kernels, and found the last known good kernel (on which suspend/resume works fine on the A11) is 2.6.34rc1, from 9-Mar-2010. The next one, 2.6.34.1 from Maverick, dated 6-Jul-2010, exhibits the problem we've been tracking here. During the process I also tested a number of 2.6.35 kernels, all of which have the same problem. So it looks like the interval between 2.6.34rc1 and 2.6.34-1 is where the problem was introduced.

How do these mainline kernels map to Ubuntu ones? is there anywhere I can get Ubuntu kernels to download and test like I did the mainlines? I also git cloned ubuntu-natty and ubuntu-maverick kernel trees to see if I could do the bisect process but I'm not sure, based on the known good and bad mainline kernels, how to pick the initial good and bad commits.

I hope this is useful, and again, my sincerest apologies for screwing up on the initial good/bad kernel assessment. I was more thorough this time.

Thanks for all your help!

Herton R. Krzesinski (herton) wrote :

No problem.

To see which ubuntu versions map to mainline kernels, I think only looking at git history or changelog to check.

To bisect in this case it's better take the mainline kernel, and push into it just the first ubuntu commit which adds the debian package structure, as all ubuntu changes are rebased between each version, and your testing shows that the problem lies on mainline kernels anyway.

I think next step then is to see in which point in 2.6.34-rcX it broke, I'll try to provide to you later some testing kernels, one build for each rc then at first.

Changed in linux (Ubuntu):
status: Incomplete → Triaged
Herton R. Krzesinski (herton) wrote :

Please check again the kernels I uploaded on the same place, and report here back which release it stops to work. I built the vanilla 2.6.34 series from rc2 to 2.6.34 final, see which one broke the video output on resume. Once we discover, I hope we have a small set to bisect again.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Daniel Manrique (roadmr) wrote :

Hi Herton,

Thanks so much for the test kernels. I tested them and found that rc3 works but rc4 doesn't. As part of my "binary search", rc2 was found to be good as well, and rc5, as well as -1 (linux-image-2.6.34-1) are bad as well.

So the problem appeared between rc3 and rc4.

Hopefully the set of changes between those two releases will be manageable.

Thanks!

Herton R. Krzesinski (herton) wrote :

There was many changes between rc3 and rc4, surprinsingly none about i915/drm which could be related, so I think this may be related to acpi then. But I'm not sure, so lets do a full bisect between them. It'll be around 9 kernels for you to test, so we have a long time...

I uploaded the first bisect step (2.6.34-0.b1), please check that.

Andy Whitcroft (apw) on 2011-03-01
Changed in linux (Ubuntu):
status: Incomplete → In Progress
status: In Progress → Incomplete
Daniel Manrique (roadmr) wrote :

Hi Herton,

I'm OK with testing all those kernels, I appreciate you taking the time to help us with this.

I tested the first one and it's good, resume from suspend works fine. I powered off and restarted twice to confirm the behavior.

Version string:
Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b1 SMP Tue Mar 1 12:48:34 UTC 2011

Thanks and I'm ready for the next one.

Herton R. Krzesinski (herton) wrote :

thanks, next bisect step uploaded, please check it: 2.6.34-0.b2

Daniel Manrique (roadmr) wrote :

Herton,

Step 2 is bad, upon resuming I have no video, can't switch to virtual console, though the system is responsive, just no display, i.e. usual "bad" behavior.

Version string:

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b2 SMP Tue Mar 1 16:00:49 UTC 2011

Thanks, I'm ready for the next one.

Herton R. Krzesinski (herton) wrote :

next one uploaded, 2.6.34-0.b3

Daniel Manrique (roadmr) wrote :

Hi Herton,

step 3 is good, resumes from suspend without issues.

Version is:

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b3 SMP Tue Mar 1 17:28:14 UTC 2011

Ready for the next one!

Herton R. Krzesinski (herton) wrote :

step 4 uploaded, feel free to check it (2.6.34-0.b4)

Daniel Manrique (roadmr) wrote :

Hi Herton,

step 4 is bad, no video after resume.

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b4 SMP Tue Mar 1 20:59:33 UTC 2011

Thanks! Ready for the next kernel, though I'll probably test that one tomorrow morning.

Herton R. Krzesinski (herton) wrote :

ok, when you can a new kernel is available, 2.6.34-0.b5

Daniel Manrique (roadmr) wrote :

Thanks Herton,

step 5 is good, video after resume works without problems.

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b5 SMP Tue Mar 1 22:41:45 UTC 2011

Ready to continue testing.

Herton R. Krzesinski (herton) wrote :

bisect is narrowing down to some acpi changes, which makes sense.

Next kernel available, 2.6.34-0.b6, please try it.

Daniel Manrique (roadmr) wrote :

Hey, glad to know the process is leading us somewhere this time!

Tested step 6, it's bad, no video after resume.

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b6 SMP Wed Mar 2 16:12:46 UTC 2011

Thanks!

Herton R. Krzesinski (herton) wrote :

2.6.34-0.b7 ready, you can grab it for testing

Daniel Manrique (roadmr) wrote :

Thanks Herton,

step 7 is good, video resumes correctly after suspend.

Version tested was:

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b7 SMP Wed Mar 2 17:44:54 UTC 2011

Herton R. Krzesinski (herton) wrote :

next one available, please check it: 2.6.34-0.b8

Daniel Manrique (roadmr) wrote :

Hi Herton,

step 8 is good as well, video resumes without issues.

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b8 SMP Wed Mar 2 19:37:57 UTC 2011

Thanks!

Herton R. Krzesinski (herton) wrote :

almost there, step 9 now uploaded: 2.6.34-0.b9 (after this should be ~2 more)

Daniel Manrique (roadmr) wrote :

Herton, this kernel for step 9 tested good, resumes immediately and without issues.

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b9 SMP Wed Mar 2 20:50:17 UTC 2011

Herton R. Krzesinski (herton) wrote :

ok, next build available: 2.6.34-0.b10

Daniel Manrique (roadmr) wrote :

Hi,

Step 10 tests good, resumes without any video problems.

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b10 SMP Wed Mar 2 21:39:18 UTC 2011

Thanks Herton!

Herton R. Krzesinski (herton) wrote :

hmm, still 2 more to go probably as git is checking a merge, uploaded the next step now: 2.6.34-0.b11

Daniel Manrique (roadmr) wrote :

Good morning,

Step 11 tests good, resume works fine, video is restored without problems.

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b11 SMP Wed Mar 2 22:33:48 UTC 2011

Thanks!

Herton R. Krzesinski (herton) wrote :

Morning!, 2.6.34-0.b12 is ready for consumption

Daniel Manrique (roadmr) wrote :

Hi Herton,

Step 12 tested good, resumes video without issues.

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b12 SMP Thu Mar 3 16:19:46 UTC 2011

Herton R. Krzesinski (herton) wrote :

Step 13 now available: 2.6.34-0.b13

Daniel Manrique (roadmr) wrote :

Herton,

step 13 checked good, resumes alright.

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b13 SMP Thu Mar 3 17:37:25 UTC 2011

Thanks!

Herton R. Krzesinski (herton) wrote :

hopefully after this just one more to test, seems git settled again about merges, please check new release 2.6.34-0.b14

Daniel Manrique (roadmr) wrote :

I'm a bit concerned because the last several tests have all been good, but I'm confident I've checked each kernel at least a couple of times.

Anyway, step 14 tested good too, video resumed fine. Hope the changes being pointed to by git make sense!

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b14 SMP Thu Mar 3 20:21:39 UTC 2011

Thanks!

Herton R. Krzesinski (herton) wrote :

Don't worry, it's just that git decided to try almost commit by commit on remaining set (may be because of merges in the middle), but still the result is looking to be good.

A uploaded the next release now: 2.6.34-0.b15

Daniel Manrique (roadmr) wrote :

Hi,

Step 15 is good as well (9th in a row!), with good suspend/resume. The kernel itself is a bit unstable, I had a few unsuccessful boot tries, but for the problem that concerns us here, once booted, it suspends/resumes correctly.

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b15 SMP Thu Mar 3 21:25:30 UTC 2011

Herton R. Krzesinski (herton) wrote :

ok, I think after this one more to go (really!), please check 2.6.34-0.b16 which is available now

Daniel Manrique (roadmr) wrote :

Hi Herton,

step 16 tested good (10th in a row!), video resumes correctly.

Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu1) ) #b16 SMP Thu Mar 3 22:31:42 UTC 2011

Thanks!

Herton R. Krzesinski (herton) wrote :

Almost there, 2.6.34-0.b17 uploaded, should be the last one, please check

Daniel Manrique (roadmr) wrote :

Hi,

Breaking the streak, step 17 was bad, no video after resume (blank screen).

Version tested was:
Linux version 2.6.34-0-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-3ubuntu3) ) #b17 SMP Fri Mar 4 15:06:12 UTC 2011

Thanks!

Herton R. Krzesinski (herton) wrote :

And the oscar goes to:

commit ac7729da880e742613129ee6dea0045328670d2d
ACPI / PM: Move ACPI video resume to a PM notifier

Ok we found the bad commit. Thanks for your patience and testing on this one :), it went almost the double of steps than I assumed.

It makes sense this commit breaking, I'll check here what it does in detail, and try a fix or report upstream. Meanwhile, if this reverts on latest kernel (2.6.38-rc), I'll try to build a new kernel with it reverted for you just to double check.

Daniel Manrique (roadmr) wrote :

Hi Herton,

On the contrary, thanks so much for your help with this problem, all I did was run the test kernels, all the hard work came from you. It did take a while but I'm glad it helped pinpoint a problem in the kernel.

Do let me know if you want me to test a kernel with that commit reverted, and hopefully this will help produce a more permanent fix for the upstream kernel.

Herton R. Krzesinski (herton) wrote :

I uploaded on the same place the latest master-next natty kernel with the commit above reverted (just had to fix a minor conflict in the revert), can you check? (2.6.38-6.33~lp719620r1)

Daniel Manrique (roadmr) wrote :

Hi Herton,

OK, so I dist-upgraded my Natty to have the latest and greatest of all, this comes with kernel 2.6.38-5 which is, as expected, bad as it still has the faulty commit.

I then tested the one you sent, version string as follows:

Linux version 2.6.38-6-generic (root@tangerine) (gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu5) ) #33~lp719620r1 SMP Fri Mar 4 18:20:29 UTC 2011

This one suspends/resumes correctly, which, as pertains to this bug, is the expected/desired behavior with the faulty commit reverted. So I can confirm that removing that commit makes the Tecra A11 resume correctly from suspend.

Now, this kernel has some weird interaction with the rest of the system, because when X comes up, all I see is the desktop background and one folder, but there's no panels, no Unity and no window manager. After a bit, a window pops up about a compiz crash. I managed to suspend the system by pushing the power button and selecting "suspend", after which it resumes and correctly brings up the backlight.

I suspect our faulty commit has nothing to do with this, but I thought you should know about that weird behavior.

Thanks!

Herton R. Krzesinski (herton) wrote :

Ok, so indeed the commit is at fault, and situation didn't changed on latest kernel on natty. This X/Unity behaviour should be another issue, no problem about that.

I looked already at the commit we found in the bisect to cause the regression and at the code alreay, so far I didn't found anything yet, and will have to make a debugging patch and new kernel build with it for you to check, when it is ready I'll ask for more testing.

Changed in linux (Ubuntu):
status: Incomplete → Triaged
Herton R. Krzesinski (herton) wrote :

Hi,

I prepared a new kernel with debugging for you to test, 2.6.38-6.33~lp719620r4

To test it, please add log_buf_len=1M kernel parameter to the kernel command line in grub, then install the kernel (for example edit /etc/default/grub and append log_buf_len=1M to GRUB_CMDLINE_LINUX_DEFAULT). As there will be extra debug output we have to increase the log size, 1M I think will be sufficient.

Then boot the kernel, perform the suspend & resume 2 times as you did before, and attach here the dmesg output.

After this, turn off the machine to restart the test, and do same test, but before you do the 2 suspend & resumes, please enable old_resume parameter I create for some extra debug:
sudo sh -c "echo Y > /sys/module/video/parameters/old_resume"
And do the test, and attach this second dmesg output.

I hope with this extra debug to check exactly more of what's happening for the bug to happen.

For a third test, and before you start the test, please enable ssh server on the machine. Then when the machine resumes and the screen is black, please try login into the machine with ssh and change the brightness writing to /sys/class/backlight/acpi_video0/brightness (see max_brightness on same directory to check what is the max value supported). Please tell if the screen remains black or not with this third test.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Daniel Manrique (roadmr) wrote :

Hi!

Here are the results of the tests with the debugging kernel:

>Then boot the kernel, perform the suspend & resume 2 times as you did
>before, and attach here the dmesg output.

Done, on first suspend there's no video back, on the second attempt video comes back fine, as we have been observing. dmesg is attached as sr-1.txt.

>After this, turn off the machine to restart the test, and do same test, but before you do the 2 >suspend & resumes, please >enable old_resume parameter I create for some extra debug:
>sudo sh -c "echo Y > /sys/module/video/parameters/old_resume"
>And do the test, and attach this second dmesg output.

With old_resume enabled, the kernel restores video just fine. I did two suspend/resume cycles just to be sure it's the same test, both resumed video without problems. dmesg attached as sr-2.txt, it was larger than 1MB so I enlarged the log buffer accordingly to capture everything starting from system boot.

>For a third test, and before you start the test, please enable ssh
>server on the machine. Then when the machine resumes and the screen is
>black, please try login into the machine with ssh and change the
>brightness writing to /sys/class/backlight/acpi_video0/brightness (see
>max_brightness on same directory to check what is the max value
>supported). Please tell if the screen remains black or not with this
>third test.

It remains black, no backlight.

dmesg shows this as last output:

[ 41.163312] acpi_video_device_lcd_set_level: set brightness to 100
[ 41.256933] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[ 41.598813] usb 1-1.6: new full speed USB device using ehci_hcd and address 6
[ 41.871745] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,commit=0
[ 42.850451] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[ 151.430915] acpi_video_device_lcd_set_level: set brightness to 73

The backlight brightness at 41.163312 is from the wakeup attempt, the one at 151.43 is when I tried to write to /sys. I set the value to 6, max_brightness was 7.

Thanks!

Daniel Manrique (roadmr) wrote :
Daniel Manrique (roadmr) wrote :
Herton R. Krzesinski (herton) wrote :

Thanks for the results. I looked at the debug output, and probably this is some bug in the firmware/bios we are triggering by the order change of when the resume callback of acpi video is called, so far by the debug output there is no difference besides the expected change of order the acpi video resume function is called. But I still have a theory and want to check it, I uploaded a new kernel package with different debugging enabled (2.6.38-6.34~lp719620r1), can you do the 2 suspend & resume tests on it and attach the dmesg output here?

Daniel Manrique (roadmr) wrote :

Hi Herton,

the kernel log from this one was smaller than the previous two ones. I used this kernel :

Linux version 2.6.38-6-generic (root@tangerine) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-5ubuntu2) ) #34~lp719620r1 SMP Thu Mar 10 09:30:46 UTC 2011 (Ubuntu 2.6.38-6.34~lp719620r1-generic 2.6.38-rc7)

I booted, then suspended, while trying to resume there was no video (black screen). I suspended and then resumed again and this time video came back just fine.

I also noticed, with this kernel, Compiz doesn't crash (as I'd observed in previous test kernels). Just thought I'd let you know.

Thanks!

Herton R. Krzesinski (herton) wrote :

Can you do the following test: please remove toshiba_bluetooth module before suspending the first time with the faulty kernel, and see if screen still goes black on first resume.

Daniel Manrique (roadmr) wrote :

Hi Herton,

Sorry, the problem persists even when removing the toshiba_bluetooth module. I even blacklisted it and powered off/rebooted so it didn't even load, but screen is still blank after first resume.

here's output for lsmod:

Module Size Used by
parport_pc 32111 1
ppdev 12849 0
tpm_infineon 17324 0
binfmt_misc 13213 1
joydev 17322 0
arc4 12473 2
snd_hda_codec_hdmi 27479 1
snd_hda_codec_realtek 255782 1
snd_hda_intel 24140 2
snd_hda_codec 90901 3 snd_hda_codec_hdmi,snd_hda_codec_realtek,snd_hda_intel
snd_hwdep 13274 1 snd_hda_codec
snd_pcm 80244 3 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec
snd_seq_midi 13132 0
snd_rawmidi 25269 1 snd_seq_midi
ath9k 103633 0
mac80211 257001 1 ath9k
i915 446562 8
snd_seq_midi_event 14475 1 snd_seq_midi
snd_seq 51291 2 snd_seq_midi,snd_seq_midi_event
uvcvideo 66851 0
ath9k_common 13611 1 ath9k
ath9k_hw 300328 2 ath9k,ath9k_common
psmouse 73312 0
ath 19141 2 ath9k,ath9k_hw
tpm_tis 13993 0
drm_kms_helper 40745 1 i915
videodev 75143 1 uvcvideo
snd_timer 28659 2 snd_pcm,snd_seq
snd_seq_device 14110 3 snd_seq_midi,snd_rawmidi,snd_seq
cfg80211 156212 3 ath9k,mac80211,ath
intel_ips 17769 0
tpm 21251 2 tpm_infineon,tpm_tis
serio_raw 12990 0
snd 55295 14 snd_hda_codec_hdmi,snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,snd_rawmidi,snd_seq,snd_timer,snd_seq_device
drm 180083 4 i915,drm_kms_helper
soundcore 12600 1 snd
snd_page_alloc 14073 2 snd_hda_intel,snd_pcm
toshiba_acpi 13796 0
i2c_algo_bit 13184 1 i915
sparse_keymap 13666 1 toshiba_acpi
tpm_bios 13460 1 tpm
video 18951 1 i915
lp 13349 0
parport 36746 3 parport_pc,ppdev,lp
ahci 21591 2
libahci 25548 1 ahci
sdhci_pci 13623 0
e1000e 138627 0
sdhci 22720 1 sdhci_pci

I'm also attaching the dmesg for this attempt, including the usual two suspend/resume sets.

Changed in linux (Ubuntu):
status: Incomplete → Triaged
Herton R. Krzesinski (herton) wrote :

Can you test now removing the toshiba_acpi module instead? It's a shot in the dark and I doubt it may change anything, but as it registers a backlight device too, worth a quick check it there isn't any conflict (anyway it shouldn't cause problems from what I saw).

Also with toshiba_acpi loaded, I think you have two backlight devices inside /sys/class/backlight/, one acpi_video0 and another toshiba. As another test when the system resume from first suspend, try writing brightness value inside /sys/class/backlight/toshiba and see if it changes anything.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Daniel Manrique (roadmr) wrote :

Hi Herton,

The system has this kernel, I reinstalled Natty to do some tests, please let me know if I should be testing using another kernel:

Linux 201002-5345 2.6.38-6-generic #34-Ubuntu SMP Tue Mar 8 14:09:10 UTC 2011 i686 i686 i386 GNU/Linux

I removed the toshiba_acpi module like you suggested, but behavior didn't change, still no screen/backlight after resume.

Again, with toshiba_acpi loaded, I did the test and tried to echo a value to /sys/class/backlight/toshiba/brightness, still no change from the backlight.

As usual, a second suspend/resume set brings things back to normal.

Thanks for your help!

Changed in linux (Ubuntu):
status: Incomplete → Triaged
Herton R. Krzesinski (herton) wrote :

Thanks, I have one more test to ask: please try now removing intel_ips module, and see if things change, you can test with latest natty kernel.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Daniel Manrique (roadmr) wrote :

Hi Herton,

I tried with latest Natty kernel:

Linux 201002-5345 2.6.38-7-generic #35-Ubuntu SMP Tue Mar 15 21:31:40 UTC 2011 i686 i686 i386 GNU/Linux

Here's what I did:

1- Start system up
2- rmmod intel_ips
3- sudo pm-suspend

Upon resuming, screen is blank, i.e. usual faulty behavior :( so it looks like it's still not helping.

I found another machine that's affected by this problem, I'll be attaching a description and relevant files in a bit.

Thanks!

Daniel Manrique (roadmr) wrote :

The other system I found that exhibits this problem is a Dell Latitude E6410 with Intel graphics.

Running kernel:

Linux ubuntu 2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19 20:34:50 UTC 2010 i686 GNU/Linux

I also tested with a Lucid (10.04) LiveCD, and, as we've been observing, it resumes successfully with that version (kernel 2.6.32-21.32).

I'm attaching lspci, dmidecode and lshw information, please let me know if anything else is needed. I hope this information is useful.

Daniel Manrique (roadmr) wrote :
Daniel Manrique (roadmr) wrote :
Herton R. Krzesinski (herton) wrote :

Ok, I thought intel_ips could affect things here as it calls i915 functions ramdomly on resume from its kthread. Unfortunately I'm out of ideas for now...

Changed in linux (Ubuntu):
status: Incomplete → Triaged
Daniel Manrique (roadmr) wrote :

Hi Herton,

Well, if it's a BIOS bug maybe updating it could help things, I'll try that next week and let you know how it goes. Who knows, maybe updating BIOS will be a valid solution!

Thanks!

Ara Pulido (apulido) wrote :

Targeting as well to Maverick, as this happens as well in 10.10 and we could try to fix it with an SRU

tags: added: blocks-hwcert
Ara Pulido (apulido) on 2011-04-07
Changed in linux (Ubuntu Natty):
importance: Undecided → High
Ara Pulido (apulido) wrote :

Any updates on this bug?

Herton R. Krzesinski (herton) wrote :

I think Daniel was going to see if there is a new bios, and if yes check if it changes the situation here.

I tried also raising this issue upstream (https://lkml.org/lkml/2011/3/18/430), but so far it didn't went forward. The only thing we know is that commit changing the order of resume of acpi video caused the issue, but it's a mistery still why, since it's the same set of instructions done as before (a restore of backlight levels), and it's the same acpi call to restore backlight, just now it's run after all devices have resumed.

The strange thing about this is that the problem happens only on first suspend/resume cycle, if you suspend/resume again backlight comes again by Daniel's description. So I think the bios has some race or problem, the debugging I asked was to see if we could get something and discover why this behaviour, but it didn't enlighten up things much, and I'm out of ideas for now for remote debugging. Perhaps the manufacturer (toshiba) could be contacted about this.

Daniel Manrique (roadmr) wrote :

Hi,

It turns out there's a new BIOS version available from Toshiba, and it's quite a version bump (we have 1.40, the new one is 2.90) so I'm guessing it contains quite a few changes.

However, I'm stuck here because Toshiba's BIOS updater is for Windows only :(

Seth Forshee (sforshee) wrote :

@Daniel: Often times the Windows-only BIOS updates are self-extracting zip files. You can try running unzip on the file to see if that is the case. If it is you can inspect the contents to see if there's something useful in there. Sometimes you'll find a DOS executable that you can run on a FreeDOS USB stick to update the BIOS. For one Toshiba machine I found an iso image that could be burned to a CD for updating the BIOS. If you don't have an optical drive you might be able to copy the contents of the iso image to a USB stick and use syslinux to make the stick bootable (I think this is what I ultimately ended up doing).

Changed in linux (Ubuntu Maverick):
status: New → Triaged
importance: Undecided → Medium
Changed in linux (Ubuntu Natty):
milestone: none → natty-updates
Victor Tuson Palau (vtuson) wrote :

Daniel has you been able to update the BIOS?

Daniel Manrique (roadmr) wrote :

Hi,

I have had no success updating the BIOS on this machine :( I tried Seth's suggestions but it seems that the update file is stored in some weird way inside the executable, I can't unzip it. I was also unable to find an alternate (i.e. for DOS or standalone) BIOS update tool from Toshiba.

I'll keep trying some more possibilities and report back here once I manage to get the BIOS update working.

viktor (lfraisse) wrote :

Happens on my system too (5y old, no BIOS available for almost 4 years now) and it's no Toshiba. Should I file a new bug or is there a more generic bug reported for this issue?

There's a message about updating BIOS during the messy boot (but it was ok with natty beta 2).
Everything has been working fine with the same system on my Lucid partition (and since Feisty actually) so I guess kernel 2.6.38 doesn't like older systems.

Herton R. Krzesinski (herton) wrote :

viktor, yes, please open a new bug report for your issue (and report here the link), specially the regression can be different, unless if reverting the commit ac7729da880e742613129ee6dea0045328670d2d also works for you.

If you are not sure, just open the new report, and test the mainline kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline/ . Pick the 2.6.32 versions and go up, since you say lucid works (it's based on 2.6.32), and detect in between which kernel versions things stopped to work, and report on the new bug.

Daniel Manrique (roadmr) wrote :

Herton, Seth:

I finally managed to apply the latest BIOS update for this Toshiba machine. So first I'd like to offer a huge apology for taking so long to perform this update.

The versions the machine shipped with were:
System BIOS Version: 1.40 (dated 12/13/09)
EC Version: 1.30

I downloaded an update that was published on May 25th, 2011. This bumps the BIOS to the following versions, as can be verified in the setup screen accessible by hitting F2:

System BIOS Version: 3.10
EC Version: 1.90

With this new BIOS, I'm happy to report that the system suspends and restores without issue, video comes back up just fine. So it looks like the assessment that this was a buggy BIOS was spot-on.

So far I've only tried this with Natty (kernel 2.6.38-8), I'll update in a bit with results from Maverick (2.6.35).

Please let me know if you'd like me to perform other tests to validate this. I won't touch the bug's status as I'm not sure how to proceed when a bug gets "fixed" by a BIOS update.

Thanks!

Ara Pulido (apulido) wrote :

I have removed the blocks-cert tag, based on Daniel's update

tags: removed: blocks-hwcert
Daniel Manrique (roadmr) wrote :

Removing blocks-hwcert as per new behavior with the updated BIOS.

Gumuiyul (gumuiyul) wrote :

Hi, I works on the lastest new laptop Thinkpad x220 4287 A28
 Thinkpad x220(2011) have same problme on both 11.04 and 11.10 alpha 2,3 and Kernel is 2.6.38.10
Most case it works good but if it goes suspend in long time then never resume in black screen with movable mouse cursor

Herton R. Krzesinski (herton) wrote :

@Lee: this bug is just for Toshiba Tecra A11, you should open a new bug in case you still have a problem and there isn't one for the Thinkpad x220

Since the bios update solved the problem, I think we can close this one, and probably there is no feasible workaround to be done in the kernel, unless someone knows the bios code/behaviour or more details about the implementation of the problematic bios revision. For now, I'm setting this to incomplete to let it expire, please set status back if there are more details still to be discussed or anything pending.

Changed in linux (Ubuntu):
assignee: Herton R. Krzesinski (herton) → nobody
Changed in linux (Ubuntu Natty):
assignee: Herton R. Krzesinski (herton) → nobody
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Changed in linux (Ubuntu Maverick):
status: Triaged → Incomplete
Changed in linux (Ubuntu Natty):
status: Triaged → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Maverick) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Maverick):
status: Incomplete → Expired
Daniel Manrique (roadmr) wrote :

This has been awaiting expiration for over a year, I'm marking as Won't Fix based on Herton's comment #93.

Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
Changed in linux (Ubuntu Natty):
status: Incomplete → Won't Fix
Daniel Manrique (roadmr) on 2013-01-29
Changed in linux (Ubuntu Maverick):
status: Expired → Won't Fix
To post a comment you must log in.