Ivybridge system fails to resume from S3/S4 with recent BIOS

Bug #891270 reported by James M. Leddy on 2011-11-16
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Critical
linux (Ubuntu)
High
Canonical Hardware Enablement
Oneiric
High
Canonical Hardware Enablement
Precise
High
Canonical Hardware Enablement

Bug Description

System environment:

chipset: ivybridge mobile
arch: x86_64
distro: ubuntu 11.10
kernels: 3.0.0-13.22 (3.0.6+backports), 3.1, 3.2-rc1
userspace: xserver: 1.10.4 and 1.11.2
            libdrm: 2.4.26 and git master (nov 10th)
            mesa: 7.11 and git master (nov 10th)
            xf86-video-intel: 2.16 and git master (nov 10th)

Starting after bios updates in mid october, all ivybridge systems we have come
across are exhibiting strange behavior. 3.0 kernels did not boot until
"drm/i915: enable ring freq scaling, RC6 and graphics turbo on Ivy Bridge v3"
was backported from 3.1 and i915 failed to load with this error

 [drm:init_status_page], render ring hws offset: 0x00000000
 [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000
head 00000000 tail 00000000 start 00000000
 [drm:i915_driver_load] *ERROR* failed to init modeset

Now they come up, but every kernel tried oopses on resume from S3 or S4 every
time. Attached are logs of the oops on various kernels. Downgrading the bios
avoids the problem on every machine, but this affects machines from multiple
ODM's including a reference platform from Intel after a bios update.

Created attachment 53531
dmesg of S3 failure on 3.0

System environment:

chipset: ivybridge mobile
arch: x86_64
distro: ubuntu 11.10
kernels: 3.0.0-13.22 (3.0.6+backports), 3.1, 3.2-rc1
userspace: xserver: 1.10.4 and 1.11.2
            libdrm: 2.4.26 and git master (nov 10th)
            mesa: 7.11 and git master (nov 10th)
            xf86-video-intel: 2.16 and git master (nov 10th)

Starting after bios updates in mid october, all ivybridge systems we have come across are exhibiting strange behavior. 3.0 kernels did not boot until "drm/i915: enable ring freq scaling, RC6 and graphics turbo on Ivy Bridge v3" was backported from 3.1 and i915 failed to load with this error

 [drm:init_status_page], render ring hws offset: 0x00000000
 [drm:init_ring_common] *ERROR* render ring initialization failed ctl 00000000 head 00000000 tail 00000000 start 00000000
 [drm:i915_driver_load] *ERROR* failed to init modeset

Now they come up, but every kernel tried oopses on resume from S3 or S4 every time. Attached are logs of the oops on various kernels. Downgrading the bios avoids the problem on every machine, but this affects machines from multiple ODM's including a reference platform from Intel after a bios update.

Created attachment 53532
S3 on 3.2-rc1

Created attachment 53533
S4 on 3.0

Hmm, can you attach the intel_reg_dumper after boot vs resume (with both BIOSes). I presume that the BIOS is doing additional bring up that we need to replicate upon resume.

On second thoughts (having read the OOPS), may I say wtf happened to our data structures upon resume?

Created attachment 53542
Good bios, reg dump after boot.

Created attachment 53543
Good bios, reg dump after resume.

Created attachment 53544
Bad bios, reg dump after boot.

Created attachment 53545
Bad bios, reg dump after (failed) resume.

(In reply to comment #8)
> Created attachment 53545 [details]
> Bad bios, reg dump after (failed) resume.

--- X13_After_Resume.txt 2011-11-14 13:28:39.837222007 -0500
+++ X18_After_Resume.txt 2011-11-14 13:28:57.465221705 -0500
@@ -28,10 +28,10 @@
                  PIPEA_LINK_N1: 0x00041eb0 (val 0x41eb0 270000)
                  PIPEA_LINK_M2: 0x00000000 (val 0x0 0)
                  PIPEA_LINK_N2: 0x00000000 (val 0x0 0)
- DSPACNTR: 0xd8004400 (enabled)
+ DSPACNTR: 0xd8004000 (enabled)
                       DSPABASE: 0x00000000
- DSPASTRIDE: 0x00001600 (88)
- DSPASURF: 0x046fb008
+ DSPASTRIDE: 0x00001580 (86)
+ DSPASURF: 0x00063000
                    DSPATILEOFF: 0x00000000 (0, 0)
                      PIPEBCONF: 0x00000000 (disabled, inactive, 8bpc)
                       HTOTAL_B: 0x00000000 (1 active, 1 total)

Xun, do you have problem for S3/S4 after upgrading the BIOS to the latest one (v67)?

Ok, not too much interesting to see there; a few minor differences in link configuration that would be good to understand but would appear not to be relevant to this issue. Back to hunting for an explanation for the apparent memory corruption.

we can reproduce this problem on our Ivybridge(both desktop and mobile) af(In reply to comment #10)
> Xun, do you have problem for S3/S4 after upgrading the BIOS to the latest one
> (v67)?

We can reproduce this problem for S3 on our Ivybridge(both desktop and mobile) after upgrading the BIOS to the latest one.
S4 seems good. It does't happens in text mode.

James M. Leddy (jm-leddy) wrote :
Download full text (3.5 KiB)

Relevant dmesg:

kernel: [ 64.737045] BUG: unable to handle kernel paging request at f9152470
kernel: [ 64.737060] IP: [<c12ae9d0>] iowrite32+0x30/0x40
kernel: [ 64.737070] *pdpt = 00000000018e5001 *pde = 000000002f10f067 *pte = 0000000000000000
kernel: [ 64.737078] Oops: 0002 [#1] SMP
kernel: [ 64.737084] Modules linked in: bnep rfcomm bluetooth parport_pc ppdev snd_hda_codec_realtek binfmt_misc snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi arc4 snd_rawmidi snd_seq_midi_event iwlagn mac80211 snd_seq snd_timer snd_seq_device i915 cfg80211 snd soundcore snd_page_alloc drm_kms_helper drm joydev i2c_algo_bit mei(C) usbhid hid shpchp lp parport video firewire_ohci firewire_core ahci crc_itu_t libahci e1000e xhci_hcd
kernel: [ 64.737162]
kernel: [ 64.737167] Pid: 1000, comm: Xorg Tainted: G C 3.1.0-2-generic-pae #3-Ubuntu
kernel: [ 64.737182] EIP: 0060:[<c12ae9d0>] EFLAGS: 00013296 CPU: 0
kernel: [ 64.737189] EIP is at iowrite32+0x30/0x40
kernel: [ 64.737195] EAX: 13040001 EBX: f762a1a8 ECX: f9152470 EDX: f9152470
kernel: [ 64.737201] ESI: 0000000a EDI: 00000040 EBP: eea07d64 ESP: eea07d64
kernel: [ 64.737209] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
kernel: [ 64.737216] Process Xorg (pid: 1000, ti=eea06000 task=f6ff8000 task.ti=eea06000)
kernel: [ 64.737223] Stack:
kernel: [ 64.737226] eea07d70 f8bbb7a8 f762a1a8 eea07d80 f8bbe78a f762a1a8 0000000a eea07da8
kernel: [ 64.737241] f8b8ab86 ee5f94c0 eea07da0 0000000a 00000001 f762a1a8 f762a1a8 0000000a
kernel: [ 64.737255] 00000001 eea07db8 f8b8b605 00000002 00000040 eea07dd4 f8b8f5f7 f762a000
kernel: [ 64.737271] Call Trace:
kernel: [ 64.737315] [<f8bbb7a8>] blt_ring_flush.part.12+0x28/0x80 [i915]
kernel: [ 64.737337] [<f8bbe78a>] blt_ring_flush+0x7a/0xa0 [i915]
kernel: [ 64.737354] [<f8b8ab86>] i915_gem_flush_ring.part.24+0x26/0xa0 [i915]
kernel: [ 64.737367] [<f8b8b605>] i915_gem_flush_ring+0x25/0x30 [i915]
kernel: [ 64.737377] [<f8b8f5f7>] i915_gem_execbuffer_flush+0x87/0xa0 [i915]
kernel: [ 64.737387] [<f8b8fd4e>] i915_gem_execbuffer_move_to_gpu+0x11e/0x130 [i915]
kernel: [ 64.737398] [<f8b9030a>] i915_gem_do_execbuffer.isra.7+0x5aa/0x7c0 [i915]
kernel: [ 64.737405] [<c103a6af>] ? iomap_atomic_prot_pfn+0x4f/0x60
kernel: [ 64.737409] [<c103a4f4>] ? iounmap_atomic+0x64/0x90
kernel: [ 64.737423] [<f8bc4cb6>] ? i915_gem_gtt_pwrite_fast.isra.18+0xab/0xd9 [i915]
kernel: [ 64.737433] [<c112b685>] ? __kmalloc+0x195/0x1e0
kernel: [ 64.737449] [<f8b90a4d>] i915_gem_execbuffer2+0x5d/0x210 [i915]
kernel: [ 64.737458] [<c12a9fcd>] ? __copy_from_user_ll+0x2d/0x40
kernel: [ 64.737474] [<f89fdf40>] drm_ioctl+0x390/0x450 [drm]
kernel: [ 64.737501] [<f8b909f0>] ? i915_gem_execbuffer+0x4d0/0x4d0 [i915]
kernel: [ 64.737512] [<c123ff14>] ? security_file_permission+0x24/0xb0
kernel: [ 64.737537] [<f89fdbb0>] ? drm_copy_field+0x80/0x80 [drm]
kernel: [ 64.737546] [<c114b899>] do_vfs_ioctl+0x79/0x2d0
kernel: [ 64.737554] [<c10bf79d>] ? rcu_irq_exit+0xd/0x10
kernel: [ 64.737560] [<c105e24c>] ? irq_exit+0x3c/0xa0
kernel: [ 64.737575] [<c114bb5f>] sys_ioctl+0x6f/0x80
kernel: [ 64.737581] [<...

Read more...

affects: ubuntu → linux (Ubuntu)
Chris Van Hoof (vanhoof) on 2011-11-16
Changed in linux (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Changed in linux (Ubuntu Oneiric):
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Canonical Hardware Enablement Team (canonical-hwe-team)
Changed in linux (Ubuntu Precise):
assignee: nobody → Canonical Hardware Enablement Team (canonical-hwe-team)

Hi folks,

for those of you affected by this issue, could you please test Jesse/Keith's patch at http://lists.freedesktop.org/archives/intel-gfx/2011-November/013544.html and report your results?

We'd like to see Tested-by acknowledgements if it works for you if possible..

This patch works for me. The problem about S3 disappears on our Ivybridge(both desktop and mobile).

BTW, I was unable to figure out which VBIOS is in use on our machines. Intel talks about
such a version theme:

54 (old)
59 (adds "multi-threaded force wake",which possibly requires Linux
    graphics driver update)
60
64
67

On our HP machine I see "INTEL 2120". I have no idea how this maps to the Intel version. Apparently HP doesn't know either. In dmidecode I couldn't find anything which would match
to the Intel version theme either. Jesse told me to look there. :-(

(In reply to comment #13)
> Hi folks,
>
> for those of you affected by this issue, could you please test Jesse/Keith's
> patch at
> http://lists.freedesktop.org/archives/intel-gfx/2011-November/013544.html and
> report your results?
>
> We'd like to see Tested-by acknowledgements if it works for you if possible..

Indeed Keith's version does work, sent my tested-by. Thank you very much!

Manoj Iyer (manjo) wrote :

The latest patches from https://bugs.freedesktop.org/show_bug.cgi?id=42923 fixes s3.
Following tests passed:
1. Test s3 10 times using fwts
2. Connect external monitor, open terminal move it to the external monitor, fwts s3, resumes ok, able to move mouse and windows back and forth after resume.
3. boot with i915.i915_enable_rc6=1 and lightdm comes up

Note: If I connect an external monitor, open a terminal move it to the external monitor, disconnect the external monitor, when X resizes to the lcd screen the window in the monitor is lost. ie clicking on the icon for the terminal does nothing.

Following tests passed:
1. Test s3 10 times resumes ok with no oops
2. Connect external monitor, open terminal move it to the external monitor, do s3, resumes ok, able to move mouse and windows back and forth after resume.
3. boot with i915.i915_enable_rc6=1 and lightdm comes up

Note: If I connect an external monitor, open a terminal move it to the external monitor, disconnect the external monitor, when X resizes to the lcd screen the window in the monitor is lost. ie clicking on the icon for the terminal does nothing.

Robert Hooker (sarvatt) wrote :

Fix backported to 3.0 for Oneiric

I think the other issue you're seeing is an unrelated DRM problem; doing an xrandr --off VGA1 followed by xrandr --auto VGA1 can leave VGA1 off when you set it to the same mode as it was before.

Manoj Iyer (manjo) wrote :

SRU JUSTIFICATION
================

ISSUE
=====
Ivybridge system fails to resume from S3/S4 with recent BIOS. On system resume causes kernel oops in i915 driver.

FIX
===
Upstream fixed the issue by adding multi-threaded forcewake support.
On IVB C0+ with newer BIOSes, the forcewake handshake has changed. There's
now a bitfield for different driver components to keep the GT powered
on. On Linux, we centralize forcewake handling in one place, so we
still just need a single bit, but we need to use the new registers if MT
forcewake is enabled.

TEST
=====
This patch was tested on Ivybridge system with the kernel posted at http://kernel.ubuntu.com/~sarvatt/fdo42923/ and the test results are noted in comment #3 in this bug, and also reported the same to https://bugs.freedesktop.org/show_bug.cgi?id=42923

The attachment "drm-i915-add-multi-threaded-forcewake-support.patch" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-sponsors please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Tim Gardner (timg-tpi) on 2011-11-22
Changed in linux (Ubuntu Oneiric):
status: Confirmed → Fix Committed
Changed in linux:
importance: Unknown → Critical
status: Unknown → Confirmed

(In reply to comment #13)
> Hi folks,
> for those of you affected by this issue, could you please test Jesse/Keith's
> patch at
> http://lists.freedesktop.org/archives/intel-gfx/2011-November/013544.html and
> report your results?
> We'd like to see Tested-by acknowledgements if it works for you if possible..

This patch has been committed to drm-intel-fixes:
http://cgit.freedesktop.org/~keithp/linux/commit/?h=drm-intel-fixes&id=8d715f0024f64ad1b1be85d8c081cf577944c847

Can the reporters verify with drm-intel-fixes?

Changed in linux:
status: Confirmed → Fix Released

I confirm it works with drm-intel-fixes commit 5be93ad2ebb975df8ba01f6c76b541ff4e9929f4.

Herton R. Krzesinski (herton) wrote :

This bug is awaiting verification that the kernel for Oneiric in -proposed solves the problem (3.0.0-15.24). Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-oneiric' to 'verification-done-oneiric'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-oneiric
Robert Hooker (sarvatt) wrote :

It does indeed work properly, verified linux 3.0.0-15.24 on an ivybridge laptop. thank you.

tags: added: verification-done-oneiric
removed: verification-needed-oneiric
Launchpad Janitor (janitor) wrote :
Download full text (13.9 KiB)

This bug was fixed in the package linux - 3.0.0-15.25

---------------
linux (3.0.0-15.25) oneiric-proposed; urgency=low

  [Brad Figg]

  * Release Tracking Bug
    - LP: #910894

  [ Upstream Kernel Changes ]

  * Revert "clockevents: Set noop handler in clockevents_exchange_device()"
    - LP: #904569

linux (3.0.0-15.24) oneiric-proposed; urgency=low

  [Herton R. Krzesinski]

  * Release Tracking Bug
    - LP: #903188

  [ Alex Bligh ]

  * (config) Change Xen paravirt drivers to be built-in
    - LP: #886521

  [ Chase Douglas ]

  * Revert "SAUCE: HID: hid-ntrig: add support for 1b96:0006 model"
    - LP: #724831
  * Revert "SAUCE: hid: ntrig: Remove unused device ids"
    - LP: #724831

  [ Seth Forshee ]

  * SAUCE: dell-wmi: Demote unknown WMI event message to pr_debug
    - LP: #581312

  [ Upstream Kernel Changes ]

  * Revert "leds: save the delay values after a successful call to
    blink_set()"
    - LP: #893741
  * xfs: Fix possible memory corruption in xfs_readlink, CVE-2011-4077
    - LP: #887298
    - CVE-2011-4077
  * drm/i915: fix IVB cursor support
    - LP: #893222
  * drm/i915: always set FDI composite sync bit
    - LP: #893222
  * jbd/jbd2: validate sb->s_first in journal_get_superblock()
    - LP: #893148
    - CVE-2011-4132
  * ALSA: hda - Don't add elements of other codecs to vmaster slave
    - LP: #893741
  * virtio-pci: fix use after free
    - LP: #893741
  * ASoC: Don't use wm8994->control_data in wm8994_readable_register()
    - LP: #893741
  * sh: Fix cached/uncaced address calculation in 29bit mode
    - LP: #893741
  * drm/i915: Fix object refcount leak on mmappable size limit error path.
    - LP: #893741
  * drm/nouveau: initialize chan->fence.lock before use
    - LP: #893741
  * drm/radeon/kms: make an aux failure debug only
    - LP: #893741
  * ALSA: usb-audio - Check the dB-range validity in the later read, too
    - LP: #893741
  * ALSA: usb-audio - Fix the missing volume quirks at delayed init
    - LP: #893741
  * KEYS: Fix a NULL pointer deref in the user-defined key type
    - LP: #893741
  * hfs: add sanity check for file name length
    - LP: #893741
  * drm/radeon: add some missing FireMV pci ids
    - LP: #893741
  * sfi: table irq 0xFF means 'no interrupt'
    - LP: #893741
  * x86, mrst: use a temporary variable for SFI irq
    - LP: #893741
  * b43: refuse to load unsupported firmware
    - LP: #893741
  * md/raid5: abort any pending parity operations when array fails.
    - LP: #893741
  * mfd: Fix twl4030 dependencies for audio codec
    - LP: #893741
  * xen:pvhvm: enable PVHVM VCPU placement when using more than 32 CPUs.
    - LP: #893741
  * xen-gntalloc: integer overflow in gntalloc_ioctl_alloc()
    - LP: #893741
  * xen-gntalloc: signedness bug in add_grefs()
    - LP: #893741
  * powerpc/ps3: Fix lost SMP IPIs
    - LP: #893741
  * powerpc: Copy down exception vectors after feature fixups
    - LP: #893741
  * backing-dev: ensure wakeup_timer is deleted
    - LP: #893741
  * block: Always check length of all iov entries in blk_rq_map_user_iov()
    - LP: #893741
  * Linux 3.0.10
    - LP: #893741
  * drm/i915: add multi-threaded forcewake support
    - LP: #891270
  * (pre-sta...

Changed in linux (Ubuntu Oneiric):
status: Fix Committed → Fix Released
Chris Van Hoof (vanhoof) wrote :

Confirmed this fix is in Precise

Changed in linux (Ubuntu Precise):
status: Confirmed → Fix Released

A patch referencing a commit referencing this bug report has been merged in Linux v3.3-rc2:

commit 8109021313c7a3d8947677391ce6ab9cd0bb1d28
Author: Daniel Vetter <email address hidden>
Date: Fri Jan 13 16:20:06 2012 -0800

    drm/i915: convert force_wake_get to func pointer in the gpu reset code

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.