[SRU][jammy] Backport "parse_proc_interrupts: fix parsing interrupt counts"
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
irqbalance (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Jammy |
Fix Released
|
Medium
|
Loïc Minier |
Bug Description
[ Impact ]
On tegra/orin platform, running Ubuntu 22.04 image and the linux-nvidia-
[ *** ] A stop job is running for irqbalance daemon (1min 17s / 1min 30s)
After the 1min 30s delay, the reboot carries on.
This appears to be happening because the version of irqbalance in jammy gets stuck repeatedly attempting to rebalance due to a bug in its parsing of /proc/interrupts.
The GPIO irqchip has name 2200000.gpio, which starts with a number. Irqbalance reads this as an interrupt count for another CPU, causing it to parse the number of CPUs as 13 which doesn't match the number of CPUs from num_online_cpus() (12), and thus it keeps rescanning.
The bug was fixed by this commit for irqbalance: https:/
The bug is "already" fixed in mantic, which has a newer version of irqbalance (1.9.2) which includes this fix.
I have made a local package with this backport and tested against jammy, and I can confirm the problem is fixed. This bug is to get this backported properly into jammy. For now my backport is available in my PPA in https:/
[ Test Plan ]
The bug is 100% reproducible on Jammy running Ubuntu on any Jetson hardware, the most obvious way to observe it is that it happens when trying to stop irqbalance, such as when trying to reboot.
Once the bug is fixed, the reboot command works flawlessly.
Additionally, running "irqbalance --debug" will show it continuously trying to "Rescanning cpu topology", after applying the fix, irqbalance --debug works as expected.
[ Where problems could occur ]
irqbalance is included widely in Ubuntu. I have tested the change on x86 (reboot, restart irqbalance and irqbalance --debug) and I am not seeing any particular side effect.
Changed in irqbalance (Ubuntu Jammy): | |
status: | New → In Progress |
importance: | Undecided → Medium |
assignee: | nobody → Loïc Minier (lool) |
Changed in irqbalance (Ubuntu): | |
importance: | Undecided → Medium |
tags: |
added: verification-done removed: verification-needed |
It really feels like there should be an unit test upstream for this kind of things, is there a way to pass a working vs non-working /proc/interrupts file to irqbalance to test with known good and know bad data before and after the change?