Screen turned off and Xorg froze due to an intel video driver bug

Bug #1100138 reported by Dima Ryazanov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

My laptop screen and the attached monitor turned off suddenly; I wasn't doing anything at the time. I ssh'ed in, and saw a kernel bug in the dmesg output.

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: linux-image 3.5.0.22.28
ProcVersionSignature: Ubuntu 3.5.0-22.34-generic 3.5.7.2
Uname: Linux 3.5.0-22-generic x86_64
ApportVersion: 2.6.1-0ubuntu9
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: dima 2471 F.... pulseaudio
Date: Tue Jan 15 19:53:58 2013
HibernationDevice: RESUME=UUID=798d75f2-2d43-40e5-9278-e1214811f4a2
InstallationDate: Installed on 2012-05-30 (231 days ago)
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Release amd64 (20120425)
MachineType: Dell Inc. Dell System XPS L321X
MarkForUpload: True
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.5.0-22-generic root=UUID=d7e21c25-1d7a-42ce-bee0-503b732523ba ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-22-generic N/A
 linux-backports-modules-3.5.0-22-generic N/A
 linux-firmware 1.95
SourcePackage: linux
UpgradeStatus: Upgraded to quantal on 2012-09-30 (108 days ago)
dmi.bios.date: 04/09/2012
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A04
dmi.board.name: 085X6F
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.chassis.version: 0.1
dmi.modalias: dmi:bvnDellInc.:bvrA04:bd04/09/2012:svnDellInc.:pnDellSystemXPSL321X:pvr:rvnDellInc.:rn085X6F:rvrA00:cvnDellInc.:ct8:cvr0.1:
dmi.product.name: Dell System XPS L321X
dmi.sys.vendor: Dell Inc.

Revision history for this message
Dima Ryazanov (dima-gmail) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Do you have a way to reproduce this bug, or was it a one time event?

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: i915-gem-object-unpin
Revision history for this message
Dima Ryazanov (dima-gmail) wrote :

No, don't know how to reproduce. It happens randomly, about once a month.

Revision history for this message
Dima Ryazanov (dima-gmail) wrote :

Actually, disregard the "once a month" - it happened again today.

Revision history for this message
Dima Ryazanov (dima-gmail) wrote :
Download full text (3.3 KiB)

It's been happening every two days lately, so I've decided to look through the code, and the bug became obvious (though I don't know the proper solution).

The first stack trace shows a warning:

WARNING: at /build/buildd/linux-3.5.0/drivers/gpu/drm/i915/i915_gem.c:3052 i915_gem_object_pin+0x15d/0x1b0 [i915]()
[...]
 Call Trace:
  [<ffffffff81051c1f>] warn_slowpath_common+0x7f/0xc0
  [<ffffffff81051c7a>] warn_slowpath_null+0x1a/0x20
  [<ffffffffa00a094d>] i915_gem_object_pin+0x15d/0x1b0 [i915]
  [<ffffffffa00a0a28>] i915_gem_object_pin_to_display_plane+0x88/0x100 [i915]
  [<ffffffffa00b14c6>] intel_pin_and_fence_fb_obj+0x56/0x120 [i915]
  [<ffffffffa00b17b3>] intel_gen6_queue_flip+0x43/0x160 [i915]
  [<ffffffffa00b58a8>] ? intel_crtc_page_flip+0x58/0x330 [i915]
  [<ffffffffa00b58a8>] ? intel_crtc_page_flip+0x58/0x330 [i915]
  [<ffffffffa00b59c1>] intel_crtc_page_flip+0x171/0x330 [i915]
  [<ffffffffa002e559>] drm_mode_page_flip_ioctl+0x229/0x2b0 [drm]
  [<ffffffffa0028c56>] ? drm_mode_object_find+0x66/0x90 [drm]
  [<ffffffffa0028b21>] ? drm_crtc_convert_to_umode+0xd1/0x150 [drm]
  [<ffffffffa001b6d3>] drm_ioctl+0x4d3/0x580 [drm]
  [<ffffffffa002e330>] ? drm_mode_gamma_get_ioctl+0x120/0x120 [drm]
  [<ffffffff81193d59>] do_vfs_ioctl+0x99/0x590
  [<ffffffff811942e9>] sys_ioctl+0x99/0xa0
  [<ffffffff8168bd29>] system_call_fastpath+0x16/0x1b

In the i915_gem_object_pin function, we see:

 if (WARN_ON(obj->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
  return -EBUSY;

So, the pin_count did not get incremented because it has reached the maximum.

If we go up a few stack frames, to intel_crtc_page_flip, we'll see:

 crtc->fb = fb;

        [...]

 ret = dev_priv->display.queue_flip(dev, crtc, fb, obj);
 if (ret)
  goto cleanup_pending;

So crtc->fb got set, but the page flip failed.

Now, let's look at the second stack trace, with the actual bug:

kernel BUG at /build/buildd/linux-3.5.0/drivers/gpu/drm/i915/i915_gem.c:3090!
[...]
Call Trace:
 [<ffffffffa00b15cc>] intel_unpin_fb_obj+0x3c/0x40 [i915]
 [<ffffffffa00b4a6c>] intel_crtc_disable+0x8c/0xb0 [i915]
 [<ffffffffa007a855>] drm_helper_disable_unused_functions+0x115/0x170 [drm_kms_helper]
 [<ffffffffa007c182>] drm_crtc_helper_set_config+0x952/0xb10 [drm_kms_helper]
 [<ffffffff81124726>] ? __generic_file_aio_write+0x236/0x440
 [<ffffffff8132e5be>] ? radix_tree_lookup_slot+0xe/0x10
 [<ffffffffa002987e>] drm_framebuffer_cleanup+0xfe/0x180 [drm]
 [<ffffffffa00aeee1>] intel_user_framebuffer_destroy+0x21/0x80 [i915]
 [<ffffffffa002d2c3>] drm_mode_rmfb+0x103/0x110 [drm]
 [<ffffffffa001b6d3>] drm_ioctl+0x4d3/0x580 [drm]
 [<ffffffffa002d1c0>] ? drm_mode_addfb2+0x6c0/0x6c0 [drm]
 [<ffffffff81181b16>] ? do_sync_write+0xe6/0x120
 [<ffffffff811c0bbb>] ? fsnotify+0x24b/0x340
 [<ffffffff81193d59>] do_vfs_ioctl+0x99/0x590
 [<ffffffff811942e9>] sys_ioctl+0x99/0xa0
 [<ffffffff8168bd29>] system_call_fastpath+0x16/0x1b

In intel_crtc_disable, we see:

 if (crtc->fb) {
  mutex_lock(&dev->struct_mutex);
  intel_unpin_fb_obj(to_intel_framebuffer(crtc->fb)->obj);
  mutex_unlock(&dev->struct_mutex);
 }

The condition is true, since crtc->fb got set earlier. So it calls intel_unpin_fb_obj, even though pin_count never got...

Read more...

Revision history for this message
Dima Ryazanov (dima-gmail) wrote :

Anything else I can do to help debug this?

It's happening almost every day, and causing data loss - so I'll be forced to switch to a different OS soon...

Revision history for this message
Dima Ryazanov (dima-gmail) wrote :

I've patched the default Ubuntu kernel with this: https://patchwork.kernel.org/patch/1985851/
It fixed the kernel bug, but the screen still turns off, and I see this message in the Xorg log:

[ 53686.894] (WW) intel(0): flip queue failed: Device or resource busy

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.