[Hardy] HAL breaks pm-utils quirks and resuming

Bug #198808 reported by Nikolaus Filus on 2008-03-05
36
Affects Status Importance Assigned to Milestone
hal (Ubuntu)
High
Martin Pitt
pm-utils (Ubuntu)
Undecided
Martin Pitt

Bug Description

Binary package hint: gnome-power-manager

Summary: After upgrading from gutsy to hardy (alpha5) resume with g-p-m no longer works. It worked previously with the old infrastructure.

from gnome-terminal as root:
   pm-suspend
      resume ends with freeze and black screen
   pm-suspend --quirk-vbestate-restore
      resume works!

implemented quirk in /usr/share/hal/fdi/information/10freedesktop/20-video-quirk-pm-samsung.fdi
root@mobile:~# lshal | grep quirk
  power_management.quirk.vbestate_restore = true (bool)

g-p-m -> suspend / logout -> suspend
   resume works, the console shows kernel messages for a second, then some magic is done (switching to X?)
   and it ends with freeze and black screen (as without the quirk).

/var/log/pm-suspend.log suggests that pm-utils ARE used, but there are only entries from suspend, not a single one from resume

Questions:
  What is the difference between g-p-m suspend and manual pm-suspend?
  How to debug the process?

Pedro Villavicencio (pedro) wrote :

Thanks for taking the time to report this bug and helping to make Ubuntu better. Could you please attach the resulting log file of: gnome-power-bugreport.sh &> gpm.log to the report? You might also want to take a look to the Debugging instructions located at https://wiki.ubuntu.com/DebuggingGNOMEPowerManager for submit any other logs related to your problem.Thanks in advance.

Changed in gnome-power-manager:
status: New → Incomplete
Nikolaus Filus (nfilus) wrote :

Real root cause found in ubuntu specific hal patch

Changed in gnome-power-manager:
status: Incomplete → Invalid
Nikolaus Filus (nfilus) wrote :

After digging deeper into hal and it's scripts I found the root cause for the problem in

hal (0.5.10-5ubuntu2) hardy; urgency=low

  * debian/patches/88_change_pm_quirk_policy.patch: Change default pm
    quirk policy to match previous Ubuntu behaviour.

 -- Matthew Garrett <email address hidden> Sun, 30 Dec 2007 19:57:28 +0000

The above patch breaks the logic of the QUIRK checking code and defaults to applying several quirks. This is the opposite of the upstream logic and
will make a lot of upstream submitted quirks (fdi files) unusable.

In my case I defined a quirk for my noteboook for
  --quirk-vbestate-restore
The patch adds
  --quirk-dpms-on
  --quirk-vbemode-restore
  --quirk-vga-mode3
  --quirk-vbe-post
  --quirk-reset-brightness
if NOT explicitly set to false. I wasn't able to find a correletion of these default quirks and the old mechanisms in /etc/default/acpi-support for disabling it.

citing: http://lists.freedesktop.org/archives/hal/2008-March/011159.html

>> Please note also https://bugs.launchpad.net/bugs/198808
>
>I saw the patch (88_change_pm_quirk_policy.patch) from the last comment
>already some days ago while check the ubuntu HAL package and was also
>wondering about. No idea why this change should make any sense, the current
>HAL code is IMO correct, no fix needed (except when you try to prevent
>suspend on nearly any machine).
>
>Danny

vlowther (victor-lowther) wrote :

Confirmed while doing development on pm-utils:

On my system:

lshal |grep quirk -> power_management.quirk.vbestate_restore = true (bool)

However, pm-suspend is called with the following parameters:
 --quirk-dpms-on --quirk-vbestate-restore --quirk-vbemode-restore --quirk-vga-mode3 --quirk-vbe-post --quirk-reset-brightness

It should only called with --quirk-vbestate-restore.

Changed in gnome-power-manager:
status: Invalid → Confirmed
Changed in hal:
status: New → Confirmed
Nikolaus Filus (nfilus) wrote :

It's still invalid for g-p-m. The bug is in the hal package.

Changed in gnome-power-manager:
status: Confirmed → Invalid
Martin Pitt (pitti) wrote :

I'm currently discussing this with Matthew Garret via email. I'm currently trying to understand what's necessary to drop that patch and maintain behaviour compatibility to previous releases.

It's quite clear to me that we can't and shouldn't break upstream behaviour eternally. We should rather add the necessary quirks to hal-info (the ones which acpi-support did in earlier releases), since this is a fixed and diminishing target.

Changed in hal:
assignee: nobody → pitti
importance: Undecided → High
milestone: none → ubuntu-8.04
status: Confirmed → In Progress
Martin Pitt (pitti) wrote :

We just discussed this in the desktop team meeting. It is a "damned if you do, damned if you don't" situation, but we decided to drop this patch for the following reasons:

 * Upstream FDI rules should be quite good nowadays, other distributions aren't reportedly much worse wrt. suspend/resume than our's.
 * We have a lot of reports that the patch breaks current hardware (like Dell Latitudes)
 * upstream FDIs will get better over time, while the old acpi-support behaviour gets more and more obsolete
 * Maintaining the patch (or rather the consequences) is fighting against upstream and thus producing pointless maintenance overhead and bugs
 * We can always update FDIs for specific models, even post-release (OTOH we cannot revert this patch after Hardy is released, since that will cause undefined regressions).

Martin Pitt (pitti) wrote :

After a more in-depth discussion we clarified the situation now. The problem is that many of the machines which do not have any FDI rules at all need some of the quirks to circumvent some kernel problems (the quirks mentioned in the affected patch).

So, this is what should happen:

 (1) laptop model has no matching FDI rule -> use the default quirks in the current patch
 (2) laptop model has matching FDI rule -> use them as they are, and do not add quirks

So the current patch provides (1), but breaks (2). To fix this, I propose that the script checks if any of $HAL_PROP_POWER_MANAGEMENT_QUIRK_* is set, it uses the upstream behaviour, otherwise it enables the kernel related ones mentioned in the patch.

In addition, we need another case:

 (3) the proprietary nvidia and fglrx drivers, and intel >= i915 [1] know how to reset the video hardware on resume and must not use any video quirk in /usr/lib/pm-utils/sleep.d/99video. resume_video() should immediately return in those cases.

This is particularly important since FDI rules only match hardware models, not device drivers. E. g. the quirks are necessary if you are using the nv driver, but detrimental if you use nvidia.

This could be checked with:

 * nvidia: lsmod | grep -qw nvidia
 * fglrx: lsmod | grep -qw fglrx
 * intel: use "lspci -n | grep -w 0300:" to find the graphics card, and then either
    * cut out the same line from "lspci", search for ([0-9]+)G and compare $1 for >= 915, or
    * cut out the product ID from lspci -n and compare it against >= 2592 (the product ID of the 915GM), since they seem to be ordered chronologically

The intel one is quite a hack, though, I'd appreciate other suggestions. But it's certainly better than what we have now.

[1] http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=commitdiff;h=37bf83ea3a1841ec63d2d9b54b485bb90386ce5b

Martin Pitt (pitti) wrote :

Step (3) needs to be done in pm-utils., since this dynamic driver check cannot be expressed in FDI rules.

Changed in gnome-power-manager:
assignee: nobody → pitti
status: Invalid → In Progress
vlowther (victor-lowther) wrote :

To do (#3) with the current ubuntu pm-utils, you should arrange for the appropriate driver package to drop a file in /etc/pm/config.d. Bug# 180378 has the solution I used to use for my system.

Mark Baas (mark-baas123) wrote :

I don't know for sure whether i am also a victim of this bug. However, I always used to be able to do suspend untill now hardy alpha 6. I have a packard bell laptop MZ057. In other word a strange model.
pm-suspend --quirk-vbestate-restore this command also didn't work.
I tried to suspend with just s2ram (0.8), no luck. I just can't resume. Nothing in the logs either.
I do use fglrx, but even without it loaded i cannot suspend and on gutsy i had no problems.

What exactly could be going wrong? I think my problem belong in this bug right? I suppose i have laptop model without fdi rule.
What can i try/provide?

vlowther (victor-lowther) wrote :

The output of lshal |grep quirk will tell you if hal knows of any quirks that should be applied to your system.

Also, the outputs of lsmod and a copy of /var/log/pm-suspend.log (if it exists) would come in handy.

You can also try the workaround I used in bug# 180378 (download the 99-non-free-nvidia file, save it /etc/pm/config.d, and change the first line to grep for fglrx instead of nvidia)

Then, once you find a combination of quirks that work (probably the default of no quirks should be used), you can save them in that file.

Matthew Garrett (mjg59) wrote :

What packages? xserver-xorg-video-intel shouldn't be providing a config.d file, since the functionality is in the kernel and not in the driver. The kernel can't provide it, because you can install multiple kernel versions simultaneously and they can't all provide the file. The logical place to put the video driver handling right now is in pm-utils.

Martin Pitt (pitti) wrote :

For the record, lsmod | grep -qw nvidia -> test -d /sys/module/nvidia

Martin Pitt (pitti) wrote :

Another discussion result with Matthew: instead of doing complicated vendor/process ID matching for Intel, it should be sufficient to just test for /sys/module/i915. It it is not yet 100% clear whether this will also do the right thing on i830 and i855 (which also use that module now), waiting for confirmation from Intel.

For now we'll upload that to hardy and collect test results. If necessary, we can finetune afterwards for i830 and i855.

Mark Baas (mark-baas123) wrote :

I tried the nvidia thing (with fglrx). No luck either, the lshal | grep quirk returns nothing.
The pm-suspend.log only tells about suspending, it doesnt even say anything about starting resuming. It must be maybe a kernel issue. Maybe i should try installing the kernel 2.6.22 from gutsy and see what happens.
In the end you suggest that i try every single combination of the nvidia hook? That is like 25 combinations, isnt there anything more specific i can do.

vlowther (victor-lowther) wrote :

(re: comment no. 14)

For the short term, having these quirk workarounds as part of pm-utils is doable.

Longer term, though, there are two goals to work towards:

1) Make HAL fdi rules flexible enough to deal with things besides system mfgr/make/model when deciding what quirks to apply. At a minimum it should also be able to consider the current video device(s) and the driver(s), as they are arguable more important than the system model (especially if we start caring about desktops with their easily-replaced video cards). This will probably involve replacing the current key matching scheme with something a but more complex.

2) If piece of software requires special handling to operate correctly across a suspend/resume cycle, it should be responsible for providing the hook, not pm-utils.

Bernhard Gehl (bernhard-gehl) wrote :

Is it absolutely necessary to implement the "sensible defaults" in the code of hal/pm-utils or wouldn't it be possible to use fdi files for that?

Why not put something in like "25-kernel-quirk-pm-checkdefaults.fdi" which
1) matches for the pm-quirks and exits if any (including the ".none") quirks are set
2) matches for the ati/nvidia/intel drivers and exits if one of these is used (fglrx manifests as "info.linux.driver = fglrx_pci" - I don't know about nvidia and intel but wouldn't it be easy to implement setting suitable keys that in a startup-script?)
3) matches system.kernel.version and sets the corresponding sensible defaults

In this way, the pm-quirk-behaviour would be completely guided by config files and changing it wouldn't require modifying code?

Oh - and picking up an idea I got from awen on Bug #202814: I think it would help diagnosing the suspend/resume process a lot, if the pm-suspend-script contained a line like "echo $* >> ~/pm-suspend.log" to log the quirks it was called with.

P.S.: Sorry if I made a stupid suggestion, I'm quite new to all this hal-fdi-stuff...

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pm-utils - 0.99.2-3ubuntu2

---------------
pm-utils (0.99.2-3ubuntu2) hardy; urgency=low

  * Add debian/patches/96-video-quirk-ignoring.patch: Ignore resume video
    quirks when using the proprietary nvidia or fglrx drivers, or Intel >=
    915G, since they are not needed on them and actively break resuming. Since
    this cannot be expressed as FDI rules with current hal, this hack needs to
    suffice for Hardy. See patch tags for links to further information.
    (LP: #198808)
  * Modify Maintainer value to match the DebianMaintainerField
    specification.

 -- Martin Pitt <email address hidden> Fri, 21 Mar 2008 13:09:45 +0100

Changed in pm-utils:
status: In Progress → Fix Released
vlowther (victor-lowther) wrote :

(re comment # 19)

If it's crazy debugging features you want, and you are comfortable running bleeding-edge code, I maintain a .deb of the pm-utils development series @ http://fnordovax.org/~victor/PmUtils/

But yes, in an ideal world HAL would handle finding the right quirks and inform pm-utils to use just the ones it determines are needed.

Bernhard Gehl [2008-03-21 12:12 -0000]:
> Why not put something in like "25-kernel-quirk-pm-checkdefaults.fdi" which
> 1) matches for the pm-quirks and exits if any (including the ".none") quirks are set
> 2) matches for the ati/nvidia/intel drivers and exits if one of these is used

That's the entire problem. Of course we would like to do that, but
FDIs don't allow that with the current hal version. So these hacks
have to do for Hardy.

> Oh - and picking up an idea I got from awen on Bug #202814: I think it
> would help diagnosing the suspend/resume process a lot, if the pm-
> suspend-script contained a line like "echo $* >> ~/pm-suspend.log" to
> log the quirks it was called with.

When I debug them, I usually add somethign like this to the script in
question:

  exec 2>/tmp/99video.trace
  set -x

Martin Pitt (pitti) wrote :

Fixed in hal bzr head, will upload soon.

Changed in hal:
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package hal - 0.5.11~rc2-1ubuntu1

---------------
hal (0.5.11~rc2-1ubuntu1) hardy; urgency=low

  * Update our git snapshot from March 01 to current 0.5.11-RC2, which brings
    a few bug fixes.
    - Adds properties for tablet PCs (LP: #90451)
    - Fixes operation on MacBookPro third generation. (LP: #129869)
  * Remove patches applied upstream:
    - 02_allow_ufs_ufstype.patch
    - 05_fix_dell_brightness.patch
  * Adapt patches to new upstream version:
    - 96_uinput_device_support.patch
    - 96_uinput_device_support.patch
  * Merge with Debian unstable; see 0.5.10+git20080301-1ubuntu1 for remaining
    Ubuntu changes.
  * Replace 88_change_pm_quirk_policy.patch with
    01_default_suspend_quirks.patch: Only set the default suspend quirks for
    kernel problem workarounds if hal-info does not define any quirks at all
    for the hardware. (LP: #198808)

hal (0.5.11~rc2-1) unstable; urgency=low

  * New upstream release candidate
  * debian/libhal-storage1.{symbols,shlibs}, debian/libhal1.{symbols,shlibs}:
    - Updated symbols and shlibs

 -- Martin Pitt <email address hidden> Fri, 21 Mar 2008 13:39:03 +0100

Changed in hal:
status: Fix Committed → Fix Released
vlowther (victor-lowther) wrote :

The fix applied in pm-utils is incomplete. You should also ignore quirks while suspending the system if you are going to ignore them when resuming.
The following code block also needs to be applied to the 20video file at the beginning of the suspend_video function:

++ if [ -d /sys/module/nvidia ] || [ -d /sys/module/fglrx ] || \
++ [ -d /sys/module/i915 ]; then
++ return
++ fi
++

vlowther (victor-lowther) wrote :

The fix as published is incomplete. If you are ignoring quirks while resuming, you should ignore them while suspending.

Changed in pm-utils:
status: Fix Released → In Progress
Bernhard Gehl (bernhard-gehl) wrote :

For some reason the fix to hal seems to have broken my wireless lan connection (Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02) on iwl3945). I could restore it to working condition by reinstalling a (cached) version of 0.5.10 (hal and libhal)
a) does this make sense?
b) do I have to open a new bug?

vlowther (victor-lowther) wrote :

(re comment #27)
That sounds like it should be a new bug.

Johne (simsonloverforever) wrote :

Hey Bernhard,

I've opened up a bug that I am relatively sure is related

https://bugs.launchpad.net/ubuntu/+bug/200064

Bernhard Gehl (bernhard-gehl) wrote :

Hi Johne,
I am not really sure since your problem seems to have come up much earlier, while my wireless has been working since feisty and just broke on the hal update - and could be unbroken by reinstalling an earlier version of hal...

Bernhard Gehl (bernhard-gehl) wrote :

... never mind, the second (0.5.11...ubuntu2) update fixed this somehow.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package pm-utils - 0.99.2-3ubuntu3

---------------
pm-utils (0.99.2-3ubuntu3) hardy; urgency=low

  * Drop 96-video-quirk-ignoring.patch again. It is incomplete (since we need
    to do the same on the suspend side) and does not really fit here (if
    pm-utils gets quirks passed on the command line, they should actually be
    used). We'll solve this in the hal suspend script instead. (LP: #198808)

 -- Martin Pitt <email address hidden> Mon, 24 Mar 2008 16:32:06 +0100

Changed in pm-utils:
status: In Progress → Fix Released
vlowther (victor-lowther) wrote :

The patch applied to the HAL suspend script should also be applied to the HAL hibernate and HAL suspend-hybrid scripts.

Martin Pitt (pitti) wrote :

Hi vlowther,

vlowther [2008-03-24 18:35 -0000]:
> The patch applied to the HAL suspend script should also be applied to
> the HAL hibernate and HAL suspend-hybrid scripts.

OK for suspend-hybrid, but for hibernate as well? are you sure?

vlowther (victor-lowther) wrote :

Yes. The default in pm-utils is to not touch the video card across a hibernate/resume cycle, but you can tell pm-utils to use the quirks passed from HAL. We may as well not break expected behaviour for those who do need to use quirks when hibernating.

Martin Pitt (pitti) wrote :

Hi,

vlowther [2008-03-25 23:15 -0000]:
> Yes. The default in pm-utils is to not touch the video card across a
> hibernate/resume cycle, but you can tell pm-utils to use the quirks
> passed from HAL. We may as well not break expected behaviour for those
> who do need to use quirks when hibernating.

I applied the same patch to suspend-hybrid and hibernate in bzr head.
I'll upload it in the next days, when some other fixes piled up.

Thanks, Martin

Marek Aaron Sapota (maarons) wrote :

Is it fixed now? There are two fixes that had been released, but my laptop still doesn't wake up.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments