pci-xgene-msi: fixed deadlock in irq_set_affinity

Bug #1359514 reported by Craig Magina
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Craig Magina
Trusty
Fix Released
Undecided
Craig Magina
Utopic
Fix Released
Medium
Craig Magina

Bug Description

[IMPACT]
Issue: CPU affinity is changed while irqbalance is running.

    + Problem explanation:

      - Old code

        + Call xgene_msi_cascade function (CPU x)

        + raw_spin_lock(&desc->lock); (CPU x)

        + Goto generic_handle_irq (CPU x)

        + The CPU x doesn't have a chance to exit the xgene_msi_cascade function to unlock desc->lock before Linux scheduce and call xgene_msi_set_affinity (irqbalance is caller) in the same CPU x

        + In irq_set_affinity, call raw_spin_lock_irqsave(&desc->lock, flags) which cause deadlock to CPU x because it disables preempt

      - New code

        + Use chained_irq_enter and exit as the standard way to cascade interrupt functions.

[TEST CASE]
Turn off irqbalance
Run a single tcp stream
Randomly change the affinity of the receiving ring:

@ ethtool -S $INTF | grep rx[0-9].*_pac @ - to detect the rx ring
@ grep $INTF /proc/interrupts@ - to find it's interrupt
@ printf "%x" $(( 2 ** $((RANDOM % 8)) )) > /proc/irq/$IRQ/smp_affinity @ - to change the affinity

[Regression Potential]
Fix specific to the xgene pci msi code.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1359514

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Craig Magina (craig.magina) wrote :

[IMPACT]
The CPU x doesn't have a chance to exit the xgene_msi_cascade function to unlock desc->lock before Linux scheduce and call xgene_msi_set_affinity (irqbalance is caller) in the same CPU x. In irq_set_affinity, call raw_spin_lock_irqsave(&desc->lock, flags) which cause deadlock to CPU x because it disables preempt.

Use chained_irq_enter and exit as the standard way to cascade interrupt functions.

[TEST CASE]
Turn off irqbalance
Run a single tcp stream
Randomly change the affinity of the receiving ring:

@ ethtool -S $INTF | grep rx[0-9].*_pac @ - to detect the rx ring
@ grep $INTF /proc/interrupts@ - to find it's interrupt
@ printf "%x" $(( 2 ** $((RANDOM % 8)) )) > /proc/irq/$IRQ/smp_affinity @ - to change the affinity

[Regression Potential]
Fix specific to the xgene pci msi code.

Changed in linux (Ubuntu):
status: Confirmed → Triaged
tags: added: patch
Revision history for this message
Craig Magina (craig.magina) wrote :
description: updated
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Trusty):
assignee: nobody → Craig Magina (craig.magina)
status: New → Fix Committed
Changed in linux (Ubuntu Utopic):
assignee: nobody → Craig Magina (craig.magina)
milestone: none → ubuntu-14.10
status: Triaged → In Progress
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
tags: added: verification-done-trusty
removed: verification-needed-trusty
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (5.8 KiB)

This bug was fixed in the package linux - 3.13.0-36.63

---------------
linux (3.13.0-36.63) trusty; urgency=low

  [ Joseph Salisbury ]

  * Release Tracking Bug
    - LP: #1365052

  [ Feng Kan ]

  * SAUCE: (no-up) irqchip:gic: change access of gicc_ctrl register to read
    modify write.
    - LP: #1357527
  * SAUCE: (no-up) arm64: optimized copy_to_user and copy_from_user
    assembly code
    - LP: #1358949

  [ Ming Lei ]

  * SAUCE: (no-up) Drop APM X-Gene SoC Ethernet driver
    - LP: #1360140
  * [Config] Drop XGENE entries
    - LP: #1360140
  * [Config] CONFIG_NET_XGENE=m for arm64
    - LP: #1360140

  [ Stefan Bader ]

  * SAUCE: Add compat macro for skb_get_hash
    - LP: #1358162
  * SAUCE: bcache: prevent crash on changing writeback_running
    - LP: #1357295

  [ Suman Tripathi ]

  * SAUCE: (no-up) arm64: Fix the csr-mask for APM X-Gene SoC AHCI SATA PHY
    clock DTS node.
    - LP: #1359489
  * SAUCE: (no-up) ahci_xgene: Skip the PHY and clock initialization if
    already configured by the firmware.
    - LP: #1359501
  * SAUCE: (no-up) ahci_xgene: Fix the link down in first attempt for the
    APM X-Gene SoC AHCI SATA host controller driver.
    - LP: #1359507

  [ Tuan Phan ]

  * SAUCE: (no-up) pci-xgene-msi: fixed deadlock in irq_set_affinity
    - LP: #1359514

  [ Upstream Kernel Changes ]

  * iwlwifi: mvm: Add a missed beacons threshold
    - LP: #1349572
  * mac80211: reset probe_send_count also in HW_CONNECTION_MONITOR case
    - LP: #1349572
  * genirq: Add an accessor for IRQ_PER_CPU flag
    - LP: #1357527
  * arm64: perf: add support for percpu pmu interrupt
    - LP: #1357527
  * cifs: sanity check length of data to send before sending
    - LP: #1283101
  * KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
    - LP: #1329434
  * KVM: nVMX: Rework interception of IRQs and NMIs
    - LP: #1329434
  * KVM: vmx: disable APIC virtualization in nested guests
    - LP: #1329434
  * HID: Add transport-driver functions to the USB HID interface.
    - LP: #1353021
  * ahci_xgene: Removing NCQ support from the APM X-Gene SoC AHCI SATA Host
    Controller driver.
    - LP: #1358498
  * fold d_kill() and d_free()
    - LP: #1354234
  * fold try_prune_one_dentry()
    - LP: #1354234
  * new helper: dentry_free()
    - LP: #1354234
  * expand the call of dentry_lru_del() in dentry_kill()
    - LP: #1354234
  * dentry_kill(): don't try to remove from shrink list
    - LP: #1354234
  * don't remove from shrink list in select_collect()
    - LP: #1354234
  * more graceful recovery in umount_collect()
    - LP: #1354234
  * dcache: don't need rcu in shrink_dentry_list()
    - LP: #1354234
  * lift the "already marked killed" case into shrink_dentry_list()
  * split dentry_kill()
    - LP: #1354234
  * expand dentry_kill(dentry, 0) in shrink_dentry_list()
    - LP: #1354234
  * shrink_dentry_list(): take parent's ->d_lock earlier
    - LP: #1354234
  * dealing with the rest of shrink_dentry_list() livelock
    - LP: #1354234
  * dentry_kill() doesn't need the second argument now
    - LP: #1354234
  * dcache: add missing lockdep annotation
    - LP: #1354234
  * fs: convert use of typedef ctl_table to struct ctl_table
 ...

Read more...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.16.0-21.28

---------------
linux (3.16.0-21.28) utopic; urgency=low

  [ Andy Whitcroft ]

  * [Config] linux-image-extra is additive to linux-image
    - LP: #1375310
  * [Config] linux-image-extra postrm is not needed on purge

  [ dann frazier ]

  * [Config] run updateconfigs after adding arm64 PCI support
  * SAUCE: (no-up) Fix pcie-xgene build failure

  [ Ming Lei ]

  * SAUCE: (no-up) apm: pcie: fix hang when no card connected

  [ Tanmay Inamdar ]

  * SAUCE: (no-up) arm64: PCI(e) arch support
  * SAUCE: (no-up) pci: APM X-Gene PCIe controller driver
  * SAUCE: (no-up) arm64: dts: APM X-Gene PCIe device tree nodes
  * SAUCE: (no-up) dt-bindings: pci: xgene pcie device tree bindings
  * SAUCE: (no-up) MAINTAINERS: entry for APM X-Gene PCIe host driver
  * SAUCE: (no-up) Add MSI/MSI-X driver for APM PCI bus
    - LP: #1318977

  [ Tim Gardner ]

  * rebase to v3.16.4
  * Release Tracking Bug
    - LP: #1377905

  [ Tuan Phan ]

  * SAUCE: (no-up) pci-xgene-msi: fixed deadlock in irq_set_affinity
    - LP: #1359514

  [ Upstream Kernel Changes ]

  * drm/nouveau: make sure display hardware is reinitialised on runtime
    resume
    - LP: #1374607

  [ Upstream Kernel Changes ]

  * rebase to v3.16.4
 -- Tim Gardner <email address hidden> Fri, 03 Oct 2014 12:10:48 -0400

Changed in linux (Ubuntu Utopic):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.