Cannot change IRQ 70 affinity: Input/output error

Bug #2054872 reported by Ionut Nechita
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
irqbalance (Ubuntu)
Fix Released
Medium
Robert Malz
Noble
Fix Released
Medium
Robert Malz

Bug Description

[ Impact ]

 * irqbalance during runtime changes smp_affinity by writing to /proc/irq/<id>/smp_affinity file.
   Previous upstream implementation was closing file just after writing to it which could cause random IO erorrs.
 * Due to this issue, irqbalance could mark IRQs as unmigratable
 * Issue is visible in v1.9.3 irqbalance packges because additional check logic has been added (470a64b190628574c28a266bdcf8960291463191)
 * Before v1.9.3 irqbalancer could still fail on fclose but that was not leading to marking IRQ as unmigratable

[ Test Plan ]

 * Install and run irqbalance in v1.9.3
 * Whenever irqbalance will decide to change IRQ affinity following error can occur:
Mar 14 11:59:55 pre-noble irqbalance[1536]: Cannot change IRQ 27 affinity: Input/output error
Mar 14 11:59:55 pre-noble irqbalance[1536]: IRQ 27 affinity is now unmanaged

[ Where problems could occur ]

 * Issue is fixed by adding fflush before closing the file to make sure write operation is finished before closing
 * During local tests I have not observed any issues after adding file flushing

[ Other Info ]

 * Fix for the issue is already merged into the upstream: https://github.com/Irqbalance/irqbalance/pull/302
 * Original description of the case below:

Hello Ubuntu Team,

I notice this today when using Ubuntu 24.04 Noble.
These messages appear quite often in the systemd journal.

My kernel is: 6.8.0-rc4-realtime-rt4

# journalctl -b --no-pager --no-hostname | grep irqbalance
Feb 24 13:21:02 systemd[1]: Started irqbalance.service - irqbalance daemon.
Feb 24 13:21:02 (qbalance)[1209]: irqbalance.service: Referenced but unset environment variable evaluates to an empty string: IRQBALANCE_ARGS
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 73 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 73 affinity is now unmanaged
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 63 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 63 affinity is now unmanaged
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 71 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 71 affinity is now unmanaged
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 61 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 61 affinity is now unmanaged
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 68 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 68 affinity is now unmanaged
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 58 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 58 affinity is now unmanaged
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 66 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 66 affinity is now unmanaged
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 64 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 64 affinity is now unmanaged
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 72 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 72 affinity is now unmanaged
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 62 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 62 affinity is now unmanaged
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 70 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 70 affinity is now unmanaged
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 60 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 60 affinity is now unmanaged
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 69 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 69 affinity is now unmanaged
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 59 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 59 affinity is now unmanaged
Feb 24 13:21:12 irqbalance[1209]: Cannot change IRQ 67 affinity: Input/output error
Feb 24 13:21:12 irqbalance[1209]: IRQ 67 affinity is now unmanaged
Feb 24 13:22:22 irqbalance[1209]: Cannot change IRQ 65 affinity: Input/output error
Feb 24 13:22:22 irqbalance[1209]: IRQ 65 affinity is now unmanaged
Feb 24 13:40:36 irqbalance[1209]: Cannot change IRQ 73 affinity: Input/output error
Feb 24 13:40:36 irqbalance[1209]: IRQ 73 affinity is now unmanaged
Feb 24 13:40:36 irqbalance[1209]: Cannot change IRQ 63 affinity: Input/output error
Feb 24 13:40:36 irqbalance[1209]: IRQ 63 affinity is now unmanaged
Feb 24 13:40:36 irqbalance[1209]: Cannot change IRQ 61 affinity: Input/output error
Feb 24 13:40:36 irqbalance[1209]: IRQ 61 affinity is now unmanaged
Feb 24 13:40:36 irqbalance[1209]: Cannot change IRQ 68 affinity: Input/output error
Feb 24 13:40:36 irqbalance[1209]: IRQ 68 affinity is now unmanaged
Feb 24 13:40:36 irqbalance[1209]: Cannot change IRQ 66 affinity: Input/output error
Feb 24 13:40:36 irqbalance[1209]: IRQ 66 affinity is now unmanaged
Feb 24 13:40:36 irqbalance[1209]: Cannot change IRQ 64 affinity: Input/output error
Feb 24 13:40:36 irqbalance[1209]: IRQ 64 affinity is now unmanaged
Feb 24 13:40:36 irqbalance[1209]: Cannot change IRQ 72 affinity: Input/output error
Feb 24 13:40:36 irqbalance[1209]: IRQ 72 affinity is now unmanaged
Feb 24 13:40:36 irqbalance[1209]: Cannot change IRQ 62 affinity: Input/output error
Feb 24 13:40:36 irqbalance[1209]: IRQ 62 affinity is now unmanaged
Feb 24 13:40:36 irqbalance[1209]: Cannot change IRQ 80 affinity: Input/output error
Feb 24 13:40:36 irqbalance[1209]: IRQ 80 affinity is now unmanaged
Feb 24 13:40:36 irqbalance[1209]: Cannot change IRQ 60 affinity: Input/output error
Feb 24 13:40:36 irqbalance[1209]: IRQ 60 affinity is now unmanaged
Feb 24 13:40:36 irqbalance[1209]: Cannot change IRQ 79 affinity: Input/output error
Feb 24 13:40:36 irqbalance[1209]: IRQ 79 affinity is now unmanaged
Feb 24 13:40:36 irqbalance[1209]: Cannot change IRQ 69 affinity: Input/output error
Feb 24 13:40:36 irqbalance[1209]: IRQ 69 affinity is now unmanaged
Feb 24 13:40:36 irqbalance[1209]: Cannot change IRQ 67 affinity: Input/output error
Feb 24 13:40:36 irqbalance[1209]: IRQ 67 affinity is now unmanaged
Feb 24 13:40:36 irqbalance[1209]: Cannot change IRQ 65 affinity: Input/output error
Feb 24 13:40:36 irqbalance[1209]: IRQ 65 affinity is now unmanaged
Feb 24 13:40:46 irqbalance[1209]: Cannot change IRQ 71 affinity: Input/output error
Feb 24 13:40:46 irqbalance[1209]: IRQ 71 affinity is now unmanaged
Feb 24 14:23:06 irqbalance[1209]: Cannot change IRQ 70 affinity: Input/output error
Feb 24 14:23:06 irqbalance[1209]: IRQ 70 affinity is now unmanaged

My irq's for this system is:

# cat /proc/interrupts
            CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 CPU15
   0: 116 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-IO-APIC 2-edge timer
   6: 0 0 5405959 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-IO-APIC 6-edge AMDI0010:03
   7: 0 0 0 0 0 0 299947 0 0 0 0 0 0 0 0 0 IR-IO-APIC 7-fasteoi pinctrl_amd
   8: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-IO-APIC 8-edge rtc0
   9: 0 41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-IO-APIC 9-fasteoi acpi
  25: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IOMMU-MSI 368-edge AMD-Vi0-Evt
  26: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IOMMU-MSI 376-edge AMD-Vi0-PPR
  27: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IOMMU-MSI 384-edge AMD-Vi0-GA
  28: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 amd_gpio 0 ACPI:Event
  29: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 amd_gpio 44 ACPI:Event
  30: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 amd_gpio 58 ACPI:Event
  31: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 amd_gpio 59 ACPI:Event
  32: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 amd_gpio 18 ACPI:Event
  33: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-0000:00:01.1 0-edge PCIe PME, pciehp
  34: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-0000:00:02.2 0-edge PCIe PME, pciehp
  35: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-0000:00:02.4 0-edge PCIe PME
  36: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-0000:00:08.1 0-edge PCIe PME
  37: 0 0 0 0 0 0 299946 0 0 0 0 0 0 0 0 0 amd_gpio 9 ELAN1201:00
  39: 0 0 0 0 0 0 0 0 0 0 23388 0 95 0 0 0 IR-PCI-MSIX-0000:04:00.3 0-edge xhci_hcd
  48: 0 0 0 0 0 0 0 0 0 0 0 0 0 8144 787089 0 IR-PCI-MSIX-0000:04:00.4 0-edge xhci_hcd
  57: 0 0 0 0 0 0 0 1302 0 0 0 0 0 0 0 0 IR-PCI-MSI-0000:04:00.6 0-edge snd_hda_intel:card2
  58: 0 192 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSI-0000:04:00.1 0-edge snd_hda_intel:card1
  59: 0 0 0 101 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:03:00.0 0-edge nvme0q0
  60: 22957 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:03:00.0 1-edge nvme0q1
  61: 0 20716 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:03:00.0 2-edge nvme0q2
  62: 0 0 24443 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:03:00.0 3-edge nvme0q3
  63: 0 0 0 25781 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:03:00.0 4-edge nvme0q4
  64: 0 0 0 0 26558 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:03:00.0 5-edge nvme0q5
  65: 0 0 0 0 0 21966 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:03:00.0 6-edge nvme0q6
  66: 0 0 0 0 0 0 23507 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:03:00.0 7-edge nvme0q7
  67: 0 0 0 0 0 0 0 19535 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:03:00.0 8-edge nvme0q8
  68: 0 0 0 0 0 0 0 0 27158 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:03:00.0 9-edge nvme0q9
  69: 0 0 0 0 0 0 0 0 0 24531 0 0 0 0 0 0 IR-PCI-MSIX-0000:03:00.0 10-edge nvme0q10
  70: 0 0 0 0 0 0 0 0 0 0 23135 0 0 0 0 0 IR-PCI-MSIX-0000:03:00.0 11-edge nvme0q11
  71: 0 0 0 0 0 0 0 0 0 0 0 21913 0 0 0 0 IR-PCI-MSIX-0000:03:00.0 12-edge nvme0q12
  72: 0 0 0 0 0 0 0 0 0 0 0 0 25956 0 0 0 IR-PCI-MSIX-0000:03:00.0 13-edge nvme0q13
  73: 0 0 0 0 0 0 0 0 0 0 0 0 0 21804 0 0 IR-PCI-MSIX-0000:03:00.0 14-edge nvme0q14
  75: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 IR-PCI-MSIX-0000:04:00.2 0-edge psp-1
  77: 0 0 182 0 0 0 0 0 0 0 0 0 0 128 0 0 IR-IO-APIC 1-fasteoi snd_hda_intel:card0
  79: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 25693 0 IR-PCI-MSIX-0000:03:00.0 15-edge nvme0q15
  80: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22773 IR-PCI-MSIX-0000:03:00.0 16-edge nvme0q16
  82: 0 0 0 18139 0 0 628301 0 236532 266 0 467208 0 0 0 0 IR-PCI-MSI-0000:02:00.0 0-edge mt7921e
  83: 0 0 0 338579 1458 0 0 0 0 1118391 0 0 0 0 0 0 IR-PCI-MSIX-0000:04:00.0 0-edge amdgpu
 NMI: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Non-maskable interrupts
 LOC: 1413160 1341033 1515330 1346117 1486268 1353018 1498988 1334907 1475044 1389584 1499673 1360128 1503094 1356648 1481898 1338550 Local timer interrupts
 SPU: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Spurious interrupts
 PMI: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Performance monitoring interrupts
 IWI: 52036 57306 36597 61733 53803 79504 67600 65707 99768 106673 39212 50191 49983 52515 24180 36440 IRQ work interrupts
 RTR: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 APIC ICR read retries
 RES: 1135734 961961 1102398 869315 1245660 933642 1169006 917816 1359995 763672 1220046 820725 1266840 869766 1094597 834027 Rescheduling interrupts
 CAL: 1063814 373555 207657 168710 192237 189824 188438 174955 233311 126152 160532 145524 171051 153168 119414 134045 Function call interrupts
 TLB: 94988 98433 115358 96450 118525 98758 124773 94916 123529 90824 119527 95110 120917 98504 107344 96577 TLB shootdowns
 TRM: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Thermal event interrupts
 THR: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Threshold APIC interrupts
 DFR: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Deferred Error APIC interrupts
 MCE: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Machine check exceptions
 MCP: 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 Machine check polls
 ERR: 1
 MIS: 0
 PIN: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Posted-interrupt notification event
 NPI: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Nested posted-interrupt event
 PIW: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Posted-interrupt wakeup event

Tags: patch
Revision history for this message
Ionut Nechita (ionut-n2001) wrote :
Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Hi,

Thank you for taking the time to report bugs and help make Ubuntu better.

To act on this report, We'll need some more information, specifically:

    What impact this bug is having on users (is this just the logging spam?)
    Specific, minimal steps to reproduce it (are you able to isolate what is causing the log entries to be recorded?)
    Specific configuration

Please add a comment with the extra information, and then set the bug status back to "New".

Changed in irqbalance (Ubuntu):
status: New → Incomplete
Revision history for this message
Robert Malz (rmalz) wrote :

debdiff with patch for noble

description: updated
Changed in irqbalance (Ubuntu):
assignee: nobody → Robert Malz (rmalz)
importance: Undecided → Medium
status: Incomplete → In Progress
Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Hi Robert,

Thanks for the patch!

Everything LGTM.

As we discussed offline, you do not need to file the SRU paperwork if you do not need to update a stable release (such as mantic or jammy at the moment).

I am sponsoring this one on your behalf.

Uploading to ubuntu (via ftp to upload.ubuntu.com):
  Uploading irqbalance_1.9.3-2ubuntu4.dsc: done.
  Uploading irqbalance_1.9.3-2ubuntu4.debian.tar.xz: done.
  Uploading irqbalance_1.9.3-2ubuntu4_source.buildinfo: done.
  Uploading irqbalance_1.9.3-2ubuntu4_source.changes: done.
Successfully uploaded packages.

Changed in irqbalance (Ubuntu Noble):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "noble.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
Robert Malz (rmalz) wrote :

irqbalance 1.9.3-2ubuntu4 verified on 6.8.0-11-generic
Issue no longer reproduces.

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Thanks for verifying this.

This should land in the noble release pocket as the ongoing time_t transition progresses and should be released soon :)

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package irqbalance - 1.9.3-2ubuntu4

---------------
irqbalance (1.9.3-2ubuntu4) noble; urgency=medium

  * d/p/lp2054872-fix-irq-io-error.patch: fix IO errors during IRQ affinity
    change (LP: #2054872)

 -- Robert Malz <email address hidden> Thu, 14 Mar 2024 17:02:44 +0100

Changed in irqbalance (Ubuntu Noble):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.