Oops after resume from hibernate on restore_image

Bug #752870 reported by Herton R. Krzesinski
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
System76
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Herton R. Krzesinski

Bug Description

Using a test machine I have here, with hibernation on resume I get a general protection fault in latest 2.6.38 on natty (version on attached report). Always reproducible. Testing a previously installed 2.6.35 kernel doesn't reproduce, so this should be a regression on recent kernel versions.

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: linux-image-2.6.38-8-generic 2.6.38-8.41
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.38-8.41-generic 2.6.38.2
Uname: Linux 2.6.38-8-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: test 1340 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xd12c0000 irq 43'
   Mixer name : 'Realtek ALC662 rev1'
   Components : 'HDA:10ec0662,1458a002,00100101'
   Controls : 34
   Simple ctrls : 18
Date: Wed Apr 6 16:42:08 2011
HibernationDevice: RESUME=UUID=a7bde095-5ae3-47be-b437-69b7d38efb3a
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release amd64 (20101007)
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.

 eth1 no wireless extensions.
Lsusb:
 Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Gigabyte Technology Co., Ltd. 945GCM-S2L
ProcEnviron:
 LANGUAGE=pt_BR:pt:en
 LANG=pt_BR.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.38-8-generic root=UUID=c2c59005-987c-4b1b-b230-224879f89697 ro no_console_suspend vga=ask
RelatedPackageVersions:
 linux-restricted-modules-2.6.38-8-generic N/A
 linux-backports-modules-2.6.38-8-generic N/A
 linux-firmware 1.49
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to natty on 2011-02-08 (57 days ago)
dmi.bios.date: 12/27/2007
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F5
dmi.board.name: 945GCM-S2L
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF5:bd12/27/2007:svnGigabyteTechnologyCo.,Ltd.:pn945GCM-S2L:pvr:rvnGigabyteTechnologyCo.,Ltd.:rn945GCM-S2L:rvr:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: 945GCM-S2L
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Revision history for this message
Herton R. Krzesinski (herton) wrote :
Revision history for this message
Herton R. Krzesinski (herton) wrote :
Revision history for this message
Herton R. Krzesinski (herton) wrote :

I'll check the oops and do a bisect, filled the bug for tracking purposes.

Changed in linux (Ubuntu):
assignee: nobody → Herton R. Krzesinski (herton)
status: New → In Progress
Revision history for this message
Herton R. Krzesinski (herton) wrote :

Testing mainline builds, I discovered that the regression came between 2.6.38.1 and 2.6.38.2

A git bisect pointed this commit introducing the regression:
ff518ea26654e05d325d996f6e3a7f5f569cc2d5 is the first bad commit
commit ff518ea26654e05d325d996f6e3a7f5f569cc2d5
Author: Yinghai Lu <email address hidden>
Date: Fri Feb 18 11:30:30 2011 +0000

    x86: Cleanup highmap after brk is concluded

    commit e5f15b45ddf3afa2bbbb10c7ea34fb32b6de0a0e upstream.

    Now cleanup_highmap actually is in two steps: one is early in head64.c
    and only clears above _end; a second one is in init_memory_mapping() and
    tries to clean from _brk_end to _end.
    It should check if those boundaries are PMD_SIZE aligned but currently
    does not.
    Also init_memory_mapping() is called several times for numa or memory
    hotplug, so we really should not handle initial kernel mappings there.

    This patch moves cleanup_highmap() down after _brk_end is settled so
    we can do everything in one step.
    Also we honor max_pfn_mapped in the implementation of cleanup_highmap.

    Signed-off-by: Yinghai Lu <email address hidden>
    Signed-off-by: Stefano Stabellini <email address hidden>
    LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop>
    Signed-off-by: H. Peter Anvin <email address hidden>
    Signed-off-by: Greg Kroah-Hartman <email address hidden>

:040000 040000 b5ed0c2971ba1162c7cd289dd351d1700eb52fbc 8f830fdb43fa30ddebb485e6f6455d669300874b M arch

Revision history for this message
Herton R. Krzesinski (herton) wrote :

Looking at the code, it seems that this commit removed the setting/restore of mmu_cr4_features, and the crash happens when it loads probably an invalid mmu_cr4_features

And indeed that's the case, today I saw this commit coming in in Linus tree:
commit 4da9484bdece39ab0b098fa711e095e3e9fc8684
Author: H. Peter Anvin <email address hidden>
Date: Wed Apr 6 13:10:02 2011 -0700

    x86, hibernate: Initialize mmu_cr4_features during boot

    Restore the initialization of mmu_cr4_features during boot, which was
    removed without comment in checkin e5f15b45ddf3afa2bbbb10c7ea34fb32b6de0a0e

    x86: Cleanup highmap after brk is concluded

    thereby breaking resume from hibernate. This restores previous
    functionality in approximately the same place, and corrects the
    reading of %cr4 on pre-CPUID hardware (%cr4 exists if and only if
    CPUID is supported.)

    However, part of the problem is that the hibernate suspend/resume
    sequence should manage the save/restore of %cr4 explicitly.

    Signed-off-by: H. Peter Anvin <email address hidden>
    Cc: Rafael J. Wysocki <email address hidden>
    Cc: Stefano Stabellini <email address hidden>
    Cc: Yinghai Lu <email address hidden>
    LKML-Reference: <email address hidden>

and it fixes the bug for me too, testing here

Revision history for this message
Carl Richell (carlrichell) wrote :

Herton,

On the kernel mailing list you asked how widespread this bug is. The bug effects 5 out of 5 System76 desktops and laptops tested thus far with Natty 64bit.

-- Carl

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.38-8.42

---------------
linux (2.6.38-8.42) natty; urgency=low

  [ David Henningsson ]

  * SAUCE: (drop after 2.6.38) ALSA: HDA: Fix dock mic for Lenovo
    X220-tablet
    - LP: #751033

  [ Gustavo F. Padovan ]

  * SAUCE: Revert "Bluetooth: Add new PID for Atheros 3011"
    - LP: #720949

  [ Herton Ronaldo Krzesinski ]

  * SAUCE: (drop after 2.6.39) v4l: make sure drivers supply a zeroed
    struct v4l2_subdev
    - LP: #745213

  [ John Johansen ]

  * AppArmor: Fix masking of capabilities in complain mode
    - LP: #748656

  [ Leann Ogasawara ]

  * [Config] Disable CONFIG_RTS_PSTOR for armel, powerpc

  [ Manoj Iyer ]

  * SAUCE: (drop after 2.6.38) add support for Lenovo tablet ID (0xE6)
    - LP: #746652

  [ Steve Langasek ]

  * [Config] Make linux-libc-dev coinstallable under multiarch
    - LP: #750585

  [ Tim Gardner ]

  * [Config] CONFIG_RTS_PSTOR=m
    - LP: #698006

  [ Upstream Kernel Changes ]

  * Revert "tcp: disallow bind() to reuse addr/port"
    - LP: #731878
  * ALSA: HDA: Add dock mic quirk for Lenovo Thinkpad X220
    - LP: #746259
  * ALSA: HDA: New AD1984A model for Dell Precision R5500
    - LP: #741516
  * Input: sparse-keymap - report scancodes with key events
  * Input: sparse-keymap - report KEY_UNKNOWN for unknown scan codes
  * KVM: SVM: Load %gs earlier if CONFIG_X86_32_LAZY_GS=n
    - LP: #729085
  * watchdog: sp5100_tco.c: Check if firmware has set correct value in
    tcobase.
    - LP: #740011
  * staging: add rts_pstor for Realtek PCIE cardreader
    - LP: #698006
  * staging: fix rts_pstor build errors
    - LP: #698006
  * Staging: rts_pstor: fixed some brace code styling issues
    - LP: #698006
  * staging: rts_pstor: potential NULL dereference
    - LP: #698006
  * Staging: rts_pstor: fix read past end of buffer
    - LP: #698006
  * staging: rts_pstor: delete a function
    - LP: #698006
  * staging: rts_pstor: fix sparse warning
    - LP: #698006
  * staging: rts_pstor: fix a bug that a greenhouse sd card can't be
    recognized
    - LP: #698006
  * staging: rts_pstor: optimize kmalloc to kzalloc
    - LP: #698006
  * staging: rts_pstor: MSXC card power class
    - LP: #698006
  * staging: rts_pstor: modify initial card clock
    - LP: #698006
  * staging: rts_pstor: set lun_mode in a different place
    - LP: #698006
  * x86, hibernate: Initialize mmu_cr4_features during boot
    - LP: #752870
 -- Leann Ogasawara <email address hidden> Fri, 08 Apr 2011 09:24:59 -0700

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in system76:
status: New → Fix Released
Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu laptop testing tracker.

A list of all reports related to this bug can be found here:
http://laptop.qa.ubuntu.com/qatracker/reports/bugs/752870

tags: added: laptop-testing
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.