Package and use 'intel_l3_parity' binary

Bug #1234260 reported by James M. Leddy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
intel
Fix Released
Low
Rodrigo-vivi
intel-gpu-tools (Ubuntu)
Fix Released
Undecided
Unassigned
Declined for Saucy by Timo Aaltonen
Nominated for Trusty by James M. Leddy
udev (Ubuntu)
New
Undecided
Unassigned
Declined for Saucy by Timo Aaltonen
Nominated for Trusty by James M. Leddy

Bug Description

For Haswell systems, Intel has created a new tool called 'intel_l3_parity'. They have requested that we use this so that we can make use of the l3 parity feature on these cards. Quote:

  DPF (L3 Parity): Haswell has extra cache which can be dynamically (during run time)
  used to replace cachelines which are exhibiting faulty behavior.

  It follows that Haswell also has the ability to detect the errors.

  Kernel driver receives the error interrupts and reports them to userspace via udev.
  The event is a parity error.

  Userspace should be listening for the udev events, and handling them in whatever way
  they see fit.

  We expected OSVs to add such hooks to their distributions, and use the intel_l3_parity
  tool to be invoked by their udev rules as needed.

The code for the binary can be found here, it looks like they're planning on adding it to the linux kernel tree but afact it's not in Linus' tree yet:

http://article.gmane.org/gmane.comp.freedesktop.xorg.drivers.intel/10847/match=intel_l3_parity

Also I haven't seen a udev script from them as yet. According to Intel the way it works is that udev has to catch the party check and then do the right thing by calling intel_l3_parity the right way.

description: updated
Revision history for this message
bwidawsk (bwidawsk) wrote :

To possibly clear up any confusion - the i915 driver will generate uevents (see the tool's udev listener for an example). These events are signs of parity errors in a specific part of the GPU's L3 cache. The tool itself allows remapping these bad locations.

At a high level, it could work like:
1. HW detects paritity error, generates interrupt.
2. Kernel reports the uevent
3. udev rule receives the uevent, and information about the bad location (row, bank, subbank, slice).
4. udev either directly, or indirectly invokes intel_l3_parity to remap (disable; poorly named, I am sorry) that part of the cache.

Please let me know if people still have confusion.

Revision history for this message
James M. Leddy (jm-leddy) wrote :

Too late for saucy, since we just released the betas. Request for T.

Revision history for this message
Vincent Cheng (vincent-c) wrote :

/usr/bin/intel_l3_parity now shipped in intel-gpu-tools 1.7-1 (utopic)

Changed in intel-gpu-tools (Ubuntu):
status: New → Fix Released
Robert Hooker (sarvatt)
Changed in intel:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.