Oops after resume from hibernate on restore_image

Bug #752870 reported by Herton R. Krzesinski on 2011-04-06
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
System76
Undecided
Unassigned
linux (Ubuntu)
Undecided
Herton R. Krzesinski

Bug Description

Using a test machine I have here, with hibernation on resume I get a general protection fault in latest 2.6.38 on natty (version on attached report). Always reproducible. Testing a previously installed 2.6.35 kernel doesn't reproduce, so this should be a regression on recent kernel versions.

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: linux-image-2.6.38-8-generic 2.6.38-8.41
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.38-8.41-generic 2.6.38.2
Uname: Linux 2.6.38-8-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: test 1340 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xd12c0000 irq 43'
   Mixer name : 'Realtek ALC662 rev1'
   Components : 'HDA:10ec0662,1458a002,00100101'
   Controls : 34
   Simple ctrls : 18
Date: Wed Apr 6 16:42:08 2011
HibernationDevice: RESUME=UUID=a7bde095-5ae3-47be-b437-69b7d38efb3a
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release amd64 (20101007)
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.

 eth1 no wireless extensions.
Lsusb:
 Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Gigabyte Technology Co., Ltd. 945GCM-S2L
ProcEnviron:
 LANGUAGE=pt_BR:pt:en
 LANG=pt_BR.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.38-8-generic root=UUID=c2c59005-987c-4b1b-b230-224879f89697 ro no_console_suspend vga=ask
RelatedPackageVersions:
 linux-restricted-modules-2.6.38-8-generic N/A
 linux-backports-modules-2.6.38-8-generic N/A
 linux-firmware 1.49
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to natty on 2011-02-08 (57 days ago)
dmi.bios.date: 12/27/2007
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F5
dmi.board.name: 945GCM-S2L
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF5:bd12/27/2007:svnGigabyteTechnologyCo.,Ltd.:pn945GCM-S2L:pvr:rvnGigabyteTechnologyCo.,Ltd.:rn945GCM-S2L:rvr:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: 945GCM-S2L
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Herton R. Krzesinski (herton) wrote :
Herton R. Krzesinski (herton) wrote :
Herton R. Krzesinski (herton) wrote :

I'll check the oops and do a bisect, filled the bug for tracking purposes.

Changed in linux (Ubuntu):
assignee: nobody → Herton R. Krzesinski (herton)
status: New → In Progress
Herton R. Krzesinski (herton) wrote :

Testing mainline builds, I discovered that the regression came between 2.6.38.1 and 2.6.38.2

A git bisect pointed this commit introducing the regression:
ff518ea26654e05d325d996f6e3a7f5f569cc2d5 is the first bad commit
commit ff518ea26654e05d325d996f6e3a7f5f569cc2d5
Author: Yinghai Lu <email address hidden>
Date: Fri Feb 18 11:30:30 2011 +0000

    x86: Cleanup highmap after brk is concluded

    commit e5f15b45ddf3afa2bbbb10c7ea34fb32b6de0a0e upstream.

    Now cleanup_highmap actually is in two steps: one is early in head64.c
    and only clears above _end; a second one is in init_memory_mapping() and
    tries to clean from _brk_end to _end.
    It should check if those boundaries are PMD_SIZE aligned but currently
    does not.
    Also init_memory_mapping() is called several times for numa or memory
    hotplug, so we really should not handle initial kernel mappings there.

    This patch moves cleanup_highmap() down after _brk_end is settled so
    we can do everything in one step.
    Also we honor max_pfn_mapped in the implementation of cleanup_highmap.

    Signed-off-by: Yinghai Lu <email address hidden>
    Signed-off-by: Stefano Stabellini <email address hidden>
    LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop>
    Signed-off-by: H. Peter Anvin <email address hidden>
    Signed-off-by: Greg Kroah-Hartman <email address hidden>

:040000 040000 b5ed0c2971ba1162c7cd289dd351d1700eb52fbc 8f830fdb43fa30ddebb485e6f6455d669300874b M arch

Herton R. Krzesinski (herton) wrote :

Looking at the code, it seems that this commit removed the setting/restore of mmu_cr4_features, and the crash happens when it loads probably an invalid mmu_cr4_features

And indeed that's the case, today I saw this commit coming in in Linus tree:
commit 4da9484bdece39ab0b098fa711e095e3e9fc8684
Author: H. Peter Anvin <email address hidden>
Date: Wed Apr 6 13:10:02 2011 -0700

    x86, hibernate: Initialize mmu_cr4_features during boot

    Restore the initialization of mmu_cr4_features during boot, which was
    removed without comment in checkin e5f15b45ddf3afa2bbbb10c7ea34fb32b6de0a0e

    x86: Cleanup highmap after brk is concluded

    thereby breaking resume from hibernate. This restores previous
    functionality in approximately the same place, and corrects the
    reading of %cr4 on pre-CPUID hardware (%cr4 exists if and only if
    CPUID is supported.)

    However, part of the problem is that the hibernate suspend/resume
    sequence should manage the save/restore of %cr4 explicitly.

    Signed-off-by: H. Peter Anvin <email address hidden>
    Cc: Rafael J. Wysocki <email address hidden>
    Cc: Stefano Stabellini <email address hidden>
    Cc: Yinghai Lu <email address hidden>
    LKML-Reference: <email address hidden>

and it fixes the bug for me too, testing here

Carl Richell (carlrichell) wrote :

Herton,

On the kernel mailing list you asked how widespread this bug is. The bug effects 5 out of 5 System76 desktops and laptops tested thus far with Natty 64bit.

-- Carl

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.38-8.42

---------------
linux (2.6.38-8.42) natty; urgency=low

  [ David Henningsson ]

  * SAUCE: (drop after 2.6.38) ALSA: HDA: Fix dock mic for Lenovo
    X220-tablet
    - LP: #751033

  [ Gustavo F. Padovan ]

  * SAUCE: Revert "Bluetooth: Add new PID for Atheros 3011"
    - LP: #720949

  [ Herton Ronaldo Krzesinski ]

  * SAUCE: (drop after 2.6.39) v4l: make sure drivers supply a zeroed
    struct v4l2_subdev
    - LP: #745213

  [ John Johansen ]

  * AppArmor: Fix masking of capabilities in complain mode
    - LP: #748656

  [ Leann Ogasawara ]

  * [Config] Disable CONFIG_RTS_PSTOR for armel, powerpc

  [ Manoj Iyer ]

  * SAUCE: (drop after 2.6.38) add support for Lenovo tablet ID (0xE6)
    - LP: #746652

  [ Steve Langasek ]

  * [Config] Make linux-libc-dev coinstallable under multiarch
    - LP: #750585

  [ Tim Gardner ]

  * [Config] CONFIG_RTS_PSTOR=m
    - LP: #698006

  [ Upstream Kernel Changes ]

  * Revert "tcp: disallow bind() to reuse addr/port"
    - LP: #731878
  * ALSA: HDA: Add dock mic quirk for Lenovo Thinkpad X220
    - LP: #746259
  * ALSA: HDA: New AD1984A model for Dell Precision R5500
    - LP: #741516
  * Input: sparse-keymap - report scancodes with key events
  * Input: sparse-keymap - report KEY_UNKNOWN for unknown scan codes
  * KVM: SVM: Load %gs earlier if CONFIG_X86_32_LAZY_GS=n
    - LP: #729085
  * watchdog: sp5100_tco.c: Check if firmware has set correct value in
    tcobase.
    - LP: #740011
  * staging: add rts_pstor for Realtek PCIE cardreader
    - LP: #698006
  * staging: fix rts_pstor build errors
    - LP: #698006
  * Staging: rts_pstor: fixed some brace code styling issues
    - LP: #698006
  * staging: rts_pstor: potential NULL dereference
    - LP: #698006
  * Staging: rts_pstor: fix read past end of buffer
    - LP: #698006
  * staging: rts_pstor: delete a function
    - LP: #698006
  * staging: rts_pstor: fix sparse warning
    - LP: #698006
  * staging: rts_pstor: fix a bug that a greenhouse sd card can't be
    recognized
    - LP: #698006
  * staging: rts_pstor: optimize kmalloc to kzalloc
    - LP: #698006
  * staging: rts_pstor: MSXC card power class
    - LP: #698006
  * staging: rts_pstor: modify initial card clock
    - LP: #698006
  * staging: rts_pstor: set lun_mode in a different place
    - LP: #698006
  * x86, hibernate: Initialize mmu_cr4_features during boot
    - LP: #752870
 -- Leann Ogasawara <email address hidden> Fri, 08 Apr 2011 09:24:59 -0700

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in system76:
status: New → Fix Released
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu laptop testing tracker.

A list of all reports related to this bug can be found here:
http://laptop.qa.ubuntu.com/qatracker/reports/bugs/752870

tags: added: laptop-testing
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers