Hibernate does not power off the machine

Bug #595822 reported by Earl Malmrose
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

"/var/log/pm-suspend.log" ends with "performing hibernate", yet the machine does not power off. Turning the machine off manually, then back on, does not resume. Suspend does not work either. Even a regular shutdown doesn't always work.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-22-generic 2.6.32-22.36
Regression: No
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-22.36-generic 2.6.32.11+drm33.2
Uname: Linux 2.6.32-22-generic i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC662 rev1 Analog [ALC662 rev1 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: oem 1223 F.... pulseaudio
 /dev/snd/controlC1: oem 1223 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0x40600000 irq 22'
   Mixer name : 'Realtek ALC662 rev1'
   Components : 'HDA:10ec0662,10ec0662,00100101'
   Controls : 16
   Simple ctrls : 9
Card1.Amixer.info:
 Card hw:1 'Camera'/'Vimicro Corp. Venus USB2.0 Camera at usb-0000:00:1d.7-7, high speed'
   Mixer name : 'USB Mixer'
   Components : 'USB0ac8:3420'
   Controls : 2
   Simple ctrls : 1
Card1.Amixer.values:
 Simple mixer control 'Mic',0
   Capabilities: cvolume cvolume-joined cswitch cswitch-joined penum
   Capture channels: Mono
   Limits: Capture 0 - 48
   Mono: Capture 0 [0%] [0.00dB] [on]
Date: Fri Jun 18 00:13:51 2010
HibernationDevice: RESUME=UUID=86227052-8d09-44cb-94e8-46504f47ed35
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release i386 (20100429)
MachineType: Soltech Corporation PT20
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-22-generic root=UUID=b68ed63c-be61-41d6-877c-a334744cc583 ro
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.34
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
dmi.bios.date: 02/03/2010
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: PT20_B1.1.189_E1.1.351_P0
dmi.board.asset.tag: Not Applicable
dmi.board.name: PT20
dmi.board.vendor: Soltech Corporation
dmi.board.version: MP
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: Soltech Corporation
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvrPT20_B1.1.189_E1.1.351_P0:bd02/03/2010:svnSoltechCorporation:pnPT20:pvrMP:rvnSoltechCorporation:rnPT20:rvrMP:cvnSoltechCorporation:ct1:cvrN/A:
dmi.product.name: PT20
dmi.product.version: MP
dmi.sys.vendor: Soltech Corporation

Revision history for this message
Earl Malmrose (earl) wrote :
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Earl,

If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kernel-hibernate
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Earl,

I've got an experimental tool that can probe the firmware on your machine to see if there are any obvious BIOS issues that may be causing these problems. Can you follow the following instructions on the machine:

sudo add-apt-repository ppa:firmware-testing-team/ppa-firmware-test-suite
sudo apt-get update
sudo apt-get install fwts

and then once installed, run the tool as follows:

sudo fwts --no-s3 --no-s4

this will generate a log file "results.log" - please attach that to the bug report. Thanks!

Pete Graner (pgraner)
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu):
assignee: nobody → Colin King (colin-king)
Revision history for this message
Earl Malmrose (earl) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Earl, it appears that the BIOS is definitely not great. Especially the _WAK method being a bit screwy:

"Reserved method must return a value (_WAK)"

Can you run:

sudo acpidump > acpidump.dat

and attach acpidump.dat to the bug. I will disassemble it and see what the _WAK method is doing.

Revision history for this message
Earl Malmrose (earl) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

The ACPI _WAK method looks like a possible problem here for suspend wakeup. It should return a package containing two Integers containing status and the power supply S-state, quoting the ACPI Spec, section 7.3.7 \_WAK (System Wake).

Also, if reboot or shutdown does not work, please disable ACPI using the following kernel option:

acpi=off

If this allows you to shutdown, then we know it's an ACPI issue, basically the machine cannot transition to state S5. This could be because the _PTS (Prepare to Sleep) Method is broken. If this is so, it could explain why suspend and hibernate is broken too.

Revision history for this message
Earl Malmrose (earl) wrote :

First attempt with "acpi=off" renders the machine unbootable.

Revision history for this message
Earl Malmrose (earl) wrote :

I tried a couple versions:

'2.6.32-22-generic' doesn't boot, last line displayed is "NET: Registered protocal family 1", then no further activity.

'2.6.35-4-generic' exact same failure.

Revision history for this message
Colin Ian King (colin-king) wrote :

Just to add, the ACPI DSDT is not the best code I've seen:

1. The _OSC method spins around in a while loop and creates the CAPB object many times, hence the warnings:

[ 0.246336] ACPI Error (dsfield-0143): [CAPB] Namespace lookup failure, AE_ALREADY_EXISTS
and
[ 0.246537] ACPI Error (psparse-0537): Method parse execution failed [\_SB_.PCI0._OSC] (Node f7010de0), AE_ALREADY_EXISTS

2. Method _BQC does not return the queried brightness level on if the LEqual() test is not true:

                    Method (_BQC, 0, NotSerialized)
                    {
                        Divide (BRTL, 0x0C, Local0, Local1)
                        If (LEqual (Local0, 0x00))
                        {
                            Return (BRTL)
                        }
                    }

..this may affect the screen brightness querying.

3. Fields IDEP, IDES and IDE1 seem to extend out of the allowed region spaces. Not good.

Revision history for this message
Colin Ian King (colin-king) wrote :

I need to think about how to proceed. I will get back to you tomorrow.

Revision history for this message
Earl Malmrose (earl) wrote :

Brightness hotkeys do affect the LCD brightness, but do not trigger the on-screen display.

Revision history for this message
Colin Ian King (colin-king) wrote :

Earl, is there any chance of getting the BIOS re-worked if I find any more issues with it?

Revision history for this message
Colin Ian King (colin-king) wrote :

Earl,

My current hypothesis is that the _PTS (prepare to sleep) ACPI Method in your BIOS is not doing what it should.

However, I'm interested to see if you see any kernel messages suggesting a kernel issue before the machine shuts down or goes into a S3 suspend state. So..

1) Boot the machine with the following kernel parameter: no_console_suspend
2) Switch to a console (e.g. Console 1, by Ctrl-Alt-F1)
3) Login at the console
4) Run: sudo pm-suspend

And note if you see any kernel errors/warnings on the console.

Repeat this but with with step 4 as: sudo shutdown -h now

And again note any kernel errors/warnings.

Revision history for this message
Colin Ian King (colin-king) wrote :

Also, going back to the acpi=off option, can you:

Boot with the kernel parameter: "acpi=off" and remove the "quiet splash" kernel parameters so we can see any more kernel error messages on the console which may explain why acpi=off fails.

Thanks

Revision history for this message
Earl Malmrose (earl) wrote :

I'll see if we can encourage some bios fixes.

Running pm-suspend doesn't display any messages, but after a few seconds the screen goes blank with the backlight still on.

Booting with acpi=off was already tested without quiet and splash. No additional messages.

Revision history for this message
Colin Ian King (colin-king) wrote :

@Earl, got the machine now. Definitely is an ACPI _PTS bug. I will dig into why it's failing, but this may take some effort.

Revision history for this message
Colin Ian King (colin-king) wrote :

The ACPI _PTS method (Prepare To Sleep) is proving to be the fault. For example, doing a shutdown the kernel executes acpi_enter_sleep_state_prep() and this executes the ACPI _PTS method from where it hangs and never returns. Section 7.3.2 of the ACPI specification states:

The _PTS control method is executed by the OS during the sleep transition process for S1, S2, S3, S4, and
for orderly S5 shutdown. The sleeping state value (For example, 1, 2, 3, 4 or 5 for the S5 soft-off state) is
passed to the _PTS control method. This method is called after OSPM has notified native device drivers of
the sleep state transition and before the OSPM has had a chance to fully prepare the system for a sleep state
transition. Thus, this control method can be executed a relatively long time before actually entering the
desired sleeping state

_PTS must return control to the OS. The fact that the machine hangs at this point due to a bug in the ACPI AML code and/or the way it operates with the embedded controller.

For a shutdown scenario, _PTS should return so that the OS can execute the ACPI _SST method and continue the shutdown.

I've disassembled the ACPI DSDT table and the AML code that implements _PTS is as follows:

    Method (_PTS, 1, NotSerialized)
    {
        Store (Arg0, P80H)
        \_SB.PCI0.LPC0.H_EC.ECSV (0x01)
        \_SB.PCI0.LPC0.H_EC.ECSV (0x03)
    }

This looks dubious to me. The Method stores it's S state (argument0) into the debug port (port 0x80) and then executes the ECSV method, and I'm pretty sure it should be passing the S state down to the chain rather than just stuffing it into a debug port.

As it is, this is the root cause to why S3, S4 and shutdown don't work. The _PTS method is broken and the only way to fix the machine is to fix the ACPI code.

Sorry to bring bad news.

Colin

Revision history for this message
Colin Ian King (colin-king) wrote :

I've enabled full ACPI opcode tracing and observed that the code executes the \_SB.PCI0.LPC0.H_EC.ECSV (0x01) which executes the following Method:

                    Method (ECSV, 1, Serialized)
                    {
                        Store (Arg0, PRM0)
                        TRAP (0x14)
                        P8XH (0x00, Arg0)
                    }

and then this calls TRAP(0x14):

    Method (TRAP, 1, Serialized)
    {
        Store (Arg0, SMIF)
        Store (0x00, TRP0)
        Return (SMIF)
    }

..and the hang occurs when storing 0x14 into SMIF, which is defined as an 8 bit region mapped at 2 bytes offset from region named GNVS. The GNVS region is the 256 byte system I/O region that ACPI code maps to physical address 0x3F5E0D7C.
(It could be in fact that the system hangs on the store of 0x00 to TRP0 and that the debug from the ACPI driver has not yet flushed out.).

Interestingly, the smaller netbook works correctly, and it's ACPI DSDT is slightly different from the larger netbook - it's GNVS region is mapped at 0x7F5E0D7C.

I suspect that the TRAP Method is triggering a SMI (System Management Interrupt) that executes the _PTS transition inside the BIOS and that's where the hang occurs.

So, that's as far as I can take it. It does not appear to be a kernel bug, but a problem in the BIOS.

Revision history for this message
Colin Ian King (colin-king) wrote :

@Earl, of the two machines you sent me, I could not get any suspend/resume/power off failures with the smaller netbook. Was that expected?

Revision history for this message
Earl Malmrose (earl) wrote :

An update from the BIOS vendor fixed this bug.

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
assignee: Colin King (colin-king) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.