apport script to collect information about a gpu hang

Bug #388467 reported by Matt Zimmerman
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xserver-xorg-video-intel (Ubuntu)
Fix Released
Wishlist
Bryce Harrington

Bug Description

Binary package hint: xorg

It's designed to be invoked manually by the user while the system is hung, but if we can somehow detect that it's locked up, then we could run it automatically.

It collects dmesg, /proc/interrupts, /proc/dri and (for Intel cards) intel_gpu_dump output at the time of the hang. It then leaves behind a crash report in /var/crash, so that after the user recovers their system, apport will collect the usual information and submit a bug on the appropriate package.

Tags: patch
Revision history for this message
Matt Zimmerman (mdz) wrote :
Revision history for this message
Bryce Harrington (bryce) wrote :

Sweet, this will help a lot. Jesse has a mechanism in mind for being able to detect when it is in this state and trigger something, so once that's in place we can hook this in. I'm setting this to wishlist since it's a new feature, but this should be a high priority to get in soon for karmic so we can use it for getting data on freezes.

Changed in xorg (Ubuntu):
importance: Undecided → Wishlist
status: New → Triaged
Revision history for this message
Bryce Harrington (bryce) wrote :

mdz reminds me, this script also needs to be run as root, so is dependent on having that functionality in apport.

Might be of use for manual analysis meanwhile.

Revision history for this message
Matt Zimmerman (mdz) wrote : Re: [Bug 388467] Re: apport script to collect information about a gpu hang

On Wed, Jun 17, 2009 at 05:20:06PM -0000, Bryce Harrington wrote:
> mdz reminds me, this script also needs to be run as root, so is
> dependent on having that functionality in apport.

That's only dependent on the trigger mechanism, not apport itself. So long
as the trigger invokes the script as root, it'll be OK.

--
 - mdz

Revision history for this message
Bryce Harrington (bryce) wrote :
Revision history for this message
Matt Zimmerman (mdz) wrote :

Attached is a debdiff which attempts to automate the whole thing. I've not been able to test the udev rule yet; I had one GPU hang but it did not result in a uevent.

Revision history for this message
Bryce Harrington (bryce) wrote :

Hey Matt, this looks cool, I'm excited to put it in. One question, is this correct:

+DRIVER=="i915, "ACTION=="change", ENV{ERROR}==1, PROGRAM="/usr/share/apport/apport-gpu-error-intel"

I notice in the patch the script is named apport-gpu-error-intel.py:

+ install -m 755 debian/apport-gpu-error-intel.py $(CURDIR)/debian/tmp/usr/share/apport

Should the PROGRAM bit have .py appended to it?

Revision history for this message
Matt Zimmerman (mdz) wrote : Re: [Bug 388467] Re: apport script to collect information about a gpu hang

On Mon, Nov 02, 2009 at 07:55:26PM -0000, Bryce Harrington wrote:
> Hey Matt, this looks cool, I'm excited to put it in. One question, is
> this correct:
>
> +DRIVER=="i915, "ACTION=="change", ENV{ERROR}==1,
> PROGRAM="/usr/share/apport/apport-gpu-error-intel"
>
> I notice in the patch the script is named apport-gpu-error-intel.py:
>
> + install -m 755 debian/apport-gpu-error-intel.py
> $(CURDIR)/debian/tmp/usr/share/apport
>
> Should the PROGRAM bit have .py appended to it?

You are correct, PROGRAM should correspond to the path where the script is
installed.

I never managed to test the udev rule, since I couldn't get the kernel to
send the relevant uevent (even when I got a hang). Looks like I botched it.
Thanks for spotting it.

--
 - mdz

Revision history for this message
Bryce Harrington (bryce) wrote :

Updated the debdiff and applied for lucid. While it sounds like it may not be 100% functional yet, I figure if we get it in lucid early it gives us plenty of time to tweak it.

xserver-xorg-video-intel (2:2.9.0-1ubuntu4) lucid; urgency=low

  [Matt Zimmerman]
  * debian/apport-gpu-error-intel.py, debian/xserver-xorg-video-intel.udev:
    Add apport script to collect debug information on GPU hangs
  * rules: Install udev rule to run the script when the kernel detects hung GPU
  * control: Add intel-gpu-tools to Recommends for use by the above

Date: Thu, 26 Nov 2009 00:53:06 -0800
Changed-By: Bryce Harrington <email address hidden>
Maintainer: Ubuntu Developers <email address hidden>
Signed-By: Bryce Harrington <email address hidden>
https://launchpad.net/ubuntu/lucid/+source/xserver-xorg-video-intel/2:2.9.0-1ubuntu4

affects: xorg (Ubuntu) → xserver-xorg-video-intel (Ubuntu)
Changed in xserver-xorg-video-intel (Ubuntu):
assignee: nobody → Bryce Harrington (bryceharrington)
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.