apport script to collect information about a gpu hang

Bug #388467 reported by Matt Zimmerman on 2009-06-17
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xserver-xorg-video-intel (Ubuntu)
Wishlist
Bryce Harrington

Bug Description

Binary package hint: xorg

It's designed to be invoked manually by the user while the system is hung, but if we can somehow detect that it's locked up, then we could run it automatically.

It collects dmesg, /proc/interrupts, /proc/dri and (for Intel cards) intel_gpu_dump output at the time of the hang. It then leaves behind a crash report in /var/crash, so that after the user recovers their system, apport will collect the usual information and submit a bug on the appropriate package.

Matt Zimmerman (mdz) wrote :
Bryce Harrington (bryce) wrote :

Sweet, this will help a lot. Jesse has a mechanism in mind for being able to detect when it is in this state and trigger something, so once that's in place we can hook this in. I'm setting this to wishlist since it's a new feature, but this should be a high priority to get in soon for karmic so we can use it for getting data on freezes.

Changed in xorg (Ubuntu):
importance: Undecided → Wishlist
status: New → Triaged
Bryce Harrington (bryce) wrote :

mdz reminds me, this script also needs to be run as root, so is dependent on having that functionality in apport.

Might be of use for manual analysis meanwhile.

On Wed, Jun 17, 2009 at 05:20:06PM -0000, Bryce Harrington wrote:
> mdz reminds me, this script also needs to be run as root, so is
> dependent on having that functionality in apport.

That's only dependent on the trigger mechanism, not apport itself. So long
as the trigger invokes the script as root, it'll be OK.

--
 - mdz

Matt Zimmerman (mdz) wrote :

Attached is a debdiff which attempts to automate the whole thing. I've not been able to test the udev rule yet; I had one GPU hang but it did not result in a uevent.

Bryce Harrington (bryce) wrote :

Hey Matt, this looks cool, I'm excited to put it in. One question, is this correct:

+DRIVER=="i915, "ACTION=="change", ENV{ERROR}==1, PROGRAM="/usr/share/apport/apport-gpu-error-intel"

I notice in the patch the script is named apport-gpu-error-intel.py:

+ install -m 755 debian/apport-gpu-error-intel.py $(CURDIR)/debian/tmp/usr/share/apport

Should the PROGRAM bit have .py appended to it?

On Mon, Nov 02, 2009 at 07:55:26PM -0000, Bryce Harrington wrote:
> Hey Matt, this looks cool, I'm excited to put it in. One question, is
> this correct:
>
> +DRIVER=="i915, "ACTION=="change", ENV{ERROR}==1,
> PROGRAM="/usr/share/apport/apport-gpu-error-intel"
>
> I notice in the patch the script is named apport-gpu-error-intel.py:
>
> + install -m 755 debian/apport-gpu-error-intel.py
> $(CURDIR)/debian/tmp/usr/share/apport
>
> Should the PROGRAM bit have .py appended to it?

You are correct, PROGRAM should correspond to the path where the script is
installed.

I never managed to test the udev rule, since I couldn't get the kernel to
send the relevant uevent (even when I got a hang). Looks like I botched it.
Thanks for spotting it.

--
 - mdz

Bryce Harrington (bryce) wrote :

Updated the debdiff and applied for lucid. While it sounds like it may not be 100% functional yet, I figure if we get it in lucid early it gives us plenty of time to tweak it.

xserver-xorg-video-intel (2:2.9.0-1ubuntu4) lucid; urgency=low

  [Matt Zimmerman]
  * debian/apport-gpu-error-intel.py, debian/xserver-xorg-video-intel.udev:
    Add apport script to collect debug information on GPU hangs
  * rules: Install udev rule to run the script when the kernel detects hung GPU
  * control: Add intel-gpu-tools to Recommends for use by the above

Date: Thu, 26 Nov 2009 00:53:06 -0800
Changed-By: Bryce Harrington <email address hidden>
Maintainer: Ubuntu Developers <email address hidden>
Signed-By: Bryce Harrington <email address hidden>
https://launchpad.net/ubuntu/lucid/+source/xserver-xorg-video-intel/2:2.9.0-1ubuntu4

affects: xorg (Ubuntu) → xserver-xorg-video-intel (Ubuntu)
Changed in xserver-xorg-video-intel (Ubuntu):
assignee: nobody → Bryce Harrington (bryceharrington)
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers