kernel 2.6.22-11 fails to resume drive correctly

Bug #139079 reported by Jos Dehaes on 2007-09-12
32
Affects Status Importance Assigned to Milestone
linux-source-2.6.22 (Ubuntu)
Medium
Unassigned

Bug Description

Binary package hint: linux-image-2.6.22-11-generic

when resuming from suspend with said kernel, the screen stays black. If I press ctrl-alt-f7 I do get to X, but some icons are not loaded, and in the terminals are errors from the kernel (journal commit IO error), and dmesg shows even more filesystem corruption messages. The system can not shutdown from within X, and many filesystem tasks fail. I can only reboot with alt-sysrq-SUB. Subsequent boots (with live CD) and forced fsck reveal no real filesystem damage. Booting the same kernel again, none of these dmesg logs have been written to disk, no info at all available in logs. When I boot the older -10 kernel, all is fine (and has been on this laptop through edgy, feisty, and most part of gutsy, until the -11 kernel).

Hardware info: 7685d63eb38f310cb1c6241a207084c1

CVE References

Jos Dehaes (jos-dehaes) wrote :

happens every time I suspend/resume with that kernel. I would say this is critical!

Jos Dehaes (jos-dehaes) wrote :

not fixed with 2.6.22-11.33

Jos Dehaes (jos-dehaes) wrote :

not fixed with -12.

As this is fairly standard hardware (Dell latitude D820), a lot of people will have this problem. I'm still using -10 kernel.

I used git-bisect to come to the following conclusion:

jos@duet ubuntu-gutsy $ git-bisect good
1af621692e22dbd994dcd9162dbeda05bd5080a3 is first bad commit
commit 1af621692e22dbd994dcd9162dbeda05bd5080a3
Author: Matthew Garrett <email address hidden>
Date: Tue Aug 28 20:27:49 2007 +0100

    Re-add _GTM and _STM support

    Add _GTM and _STM support to libata-acpi and the suspend/resume
    pathway. Minor mismerge in the previous diff - please use this one
    instead.

    Signed-off-by: Matthew Garrett <email address hidden>

:040000 040000 d5cbedbe846bf03ff02afaedefc82c6c3a8992fb 75986a3db3fc4ff067fd19bd7bf9f50c805a8665 M drivers
:040000 040000 dd28c971433eecb534ec606bd5a131b074eeae2a e3e7de246ed1ed43df021145105b2d358a1f7226 M include

Henrik Nilsen Omma (henrik) wrote :

Hi thanks for report this. Could you please follow the instructions on https://wiki.ubuntu.com/KernelTeamBugPolicies and https://wiki.ubuntu.com/DebuggingKernelSuspend and attach the output? Thanks!

Changed in linux-source-2.6.22:
importance: Undecided → Medium
status: New → Incomplete
Jos Dehaes (jos-dehaes) wrote :
Jos Dehaes (jos-dehaes) wrote :
Jos Dehaes (jos-dehaes) wrote :
Jos Dehaes (jos-dehaes) wrote :
Jos Dehaes (jos-dehaes) wrote :

The dmesg after reboot is not helpful here I think. What happens at resume is that the drive does not resume correctly, the filesystem is mounted read-only, and no logs of the event can be written on disk. This makes it hard to give diagnostics, although dmesg at resume time gives plenty of errors.

If I can get it to work, I will try to debug with a serial console, so I can get the relevant output.

Jos Dehaes (jos-dehaes) wrote :

I tried with ssh from another machine, but the network is down after resume, and I couldn't get it back online, IO errors. But I typed over some parts of the dmesg, maybe this is helpful.

Jos Dehaes (jos-dehaes) wrote :

This patch reverts the broken commit mentioned earlier. It was formatted with git-format-patch against origin (current git), and should apply to latest linux-source-2.6.22.

On my system this patch works, I can suspend resume without problems.

Matthew Garrett (mjg59) wrote :

The ACPI code is required to support suspend/resume on basically every HP laptop produced in the past three years, so we can't revert it.

Matthew Garrett (mjg59) wrote :

Could you try this patch?

Jos Dehaes (jos-dehaes) wrote :

patch seems to work, although there is still the message:
[ 237.932000] ata1.00: _GTF unexpected object type 0x1

sometimes. Not every time. Also the first boot, I could suspend and resume ok, but the next suspend gave a kernel panic. The next reboot, I was able to suspend/resume more than 5 times in a row, no failure since.

Attached is a kern.log that contains all this info (as this time the drive was resumed ok ;-)).

I would say, go ahead and commit, as this is clearly better than what is available now.

Tim Gardner (timg-tpi) wrote :

Gutsy commit 8db52ac7e3d1bf086356e8e9d62841bce280da11

Changed in linux-source-2.6.22:
status: Incomplete → Fix Committed
Jos Dehaes (jos-dehaes) wrote :

I just tried a few suspend/resume cycles with latest git, it still works. Thanks!

Kyle McMartin (kyle) wrote :
Download full text (5.9 KiB)

linux-source-2.6.22 (2.6.22-13.40) gutsy; urgency=low

  [Amit Kucheria]

  * Enable CONFIG_VM86 for LPIA
    - LP: #146311
  * Update configuration files
  * Disable MSI by default
  * Add mmconf documentation
  * Update configuration files

  [Bartlomiej Zolnierkiewicz]

  * ide-disk: workaround for buggy HPA support on ST340823A (take 3)
    - LP: #26119

  [Ben Collins]

  * ubuntu/cell: Fixup ps3 related modules for d-i, enable RTAS console
  * ubuntu/cell: Enable CELLEB and related modules (pata_scc)
  * ubuntu/cell: Move ps3rom to storage-core. Also use spidernet, not
    spider_net.
  * ubuntu/cell: Set PS3_MANAGER=y
  * ubuntu: Set NR_CPUS=256 for sparc64-smp

  [Chuck Short]

  * [USB] USB] Support for MediaTek MT6227 in cdc-acm.
    - LP: #134123
  * [XEN] Fix xen vif create with more than 14 guests.
    - LP: #14486

  [Jorge Juan Chico]

  * ide: ST320413A has the same problem as ST340823A
    - LP: #26119

  [Kyle McMartin]

  * fix -rt build
  * fix ia32entry-xen.S for CVE-2007-4573
  * fix build when CONFIG_PCI_MSI is not set

  [Matthew Garrett]

  * hostap: send events on data interface as well as master interface
    - LP: #57146
  * A malformed _GTF object should not prevent ATA device recovery
    - LP: #139079
  * hostap: send events on data interface as well as master interface
    - LP: #57146
  * A malformed _GTF object should not prevent ATA device recovery
    - LP: #139079
  * Don't lose appletouch button release events
  * Fix build with appletouch change
  * Disable Thinkpad backlight support on machines with ACPI video
    - LP: #148055
  * Don't attempt to register a callback if there is no CMOS object
    - LP: #145857
  * Update ACPI bay hotswap code to support locking
    - LP: #148219
  * Update ACPI bay hotswap code to support locking
    - LP: #148219
  * Don't attempt to register a callback if there is no CMOS object
    - LP: #145857
  * Disable Thinkpad backlight support on machines with ACPI video
    - LP: #148055

  [Steffen Klassert]

  * 3c59x: fix duplex configuration
    - LP: #94186

  [Thomas Gleixner]

  * clockevents: remove the suspend/resume workaround^Wthinko

  [Tim Gardner]

  * orinoco_cs.ko missing
    - LP: #125832
  * Marvell Technology ethernet card not recognized and not operational
    - LP: #135316
  * Marvell Technology ethernet card not recognized and not operational
    - LP: #135316
  * acpi_scan_rsdp() breaks some PCs by not honouring ACPI specification
    - LP: #144336
  * VIA southbridge Intel id missing
    - LP: #128289
  * Add T-Sinus 111card to hostap_cs driver to be able to upload firmware
    - LP: #132466
  * RTL8111 PCI Express Gigabit driver r8169 big files produce slow file
    transfer
    - LP: #114171
  * Guest OS does not recognize a lun with non zero target id on Vmware ESX
    Server
    - LP: #140761
  * Modualrize vesafb
    - LP: #139505
  * Nikon cameras need support in unusual_devs.h
    - LP: #134477
  * agp for i830m broken in gutsy
    - LP: #139767
  * hdaps: Added support for Thinkpad T61
    - LP: #147383
  * xen: Update config for i386
    - LP: #139047
  * xen: resync for amd64
    - LP: #139047
  * ide-disk: workaround for buggy HPA support ...

Read more...

Changed in linux-source-2.6.22:
status: Fix Committed → Fix Released
Jos Dehaes (jos-dehaes) wrote :

indeed fixed in current kernel. Thanks.

nyc_863 (justin+ubuntu) wrote :

This isn't fixed for me in 2.6.22-14-generic

Not always, but sometimes, upon resume, I get the errors reported by the duplicates
of this bug:

ata1.01 revalidation failed errno -5
srst failed errno -16
(etc)

eventually the file system remounts ro and is useless only a hard power off.

nyc_863 (justin+ubuntu) wrote :

Ok ignore my last post I'm new to this

unless I'm reading things wrongly, 7.10 release of yesterday still has
the problem but it is fixed *if* you want to download a newer patched
kernel and compile and install it?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers