[SRU][jammy] Backport "parse_proc_interrupts: fix parsing interrupt counts"

Bug #2038300 reported by Nicolas Dechesne
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
irqbalance (Ubuntu)
Fix Released
Medium
Unassigned
Jammy
Fix Released
Medium
Loïc Minier

Bug Description

[ Impact ]

On tegra/orin platform, running Ubuntu 22.04 image and the linux-nvidia-tegra-igx kernel. When trying to run the 'reboot' command, I am seeing:

[ *** ] A stop job is running for irqbalance daemon (1min 17s / 1min 30s)

After the 1min 30s delay, the reboot carries on.

This appears to be happening because the version of irqbalance in jammy gets stuck repeatedly attempting to rebalance due to a bug in its parsing of /proc/interrupts.

The GPIO irqchip has name 2200000.gpio, which starts with a number. Irqbalance reads this as an interrupt count for another CPU, causing it to parse the number of CPUs as 13 which doesn't match the number of CPUs from num_online_cpus() (12), and thus it keeps rescanning.

The bug was fixed by this commit for irqbalance: https://github.com/Irqbalance/irqbalance/commit/0a82dddbaf5702caded0d0d83a6eafaca743254d, which is not present in the current jammy version.

The bug is "already" fixed in mantic, which has a newer version of irqbalance (1.9.2) which includes this fix.

I have made a local package with this backport and tested against jammy, and I can confirm the problem is fixed. This bug is to get this backported properly into jammy. For now my backport is available in my PPA in https://launchpad.net/~ndec/+archive/ubuntu/ppa-ndec.

[ Test Plan ]

The bug is 100% reproducible on Jammy running Ubuntu on any Jetson hardware, the most obvious way to observe it is that it happens when trying to stop irqbalance, such as when trying to reboot.

Once the bug is fixed, the reboot command works flawlessly.

Additionally, running "irqbalance --debug" will show it continuously trying to "Rescanning cpu topology", after applying the fix, irqbalance --debug works as expected.

[ Where problems could occur ]

irqbalance is included widely in Ubuntu. I have tested the change on x86 (reboot, restart irqbalance and irqbalance --debug) and I am not seeing any particular side effect.

Revision history for this message
Loïc Minier (lool) wrote :

It really feels like there should be an unit test upstream for this kind of things, is there a way to pass a working vs non-working /proc/interrupts file to irqbalance to test with known good and know bad data before and after the change?

summary: - [jammy] Backport "parse_proc_interrupts: fix parsing interrupt counts"
+ [SRU][jammy] Backport "parse_proc_interrupts: fix parsing interrupt
+ counts"
Changed in irqbalance (Ubuntu):
status: New → Fix Released
Revision history for this message
Loïc Minier (lool) wrote :

"irqbalance --debug" as non-root might go through that codepath, albeit /proc/interrupts is a hardcoded path.

Revision history for this message
Nicolas Dechesne (ndec) wrote :

yes, 'irqbalance --debug' as non root goes through it, and can show the problem indeed. On Tegra running jammy irqbalance, it will continuously loop and restart parsing /proc/interupts, with the fix it is ok. On x86 both work fine.

Revision history for this message
Nicolas Dechesne (ndec) wrote :

debdiff attached.

Revision history for this message
Nicolas Dechesne (ndec) wrote :

@lool, I tried to look how we could add a specific test case for this specific issue, and unfortunately, that is far from being straightforward. First, there is a no tests infrastructure upstream at all, and the while I admit that the parsing of /proc/interrupts is fragile, it's done in such a way that it cannot be tested in isolation, and it's intermixed with parsing other data structure in /proc. Is it possible to move forward with the SRU as it is?

Revision history for this message
Loïc Minier (lool) wrote :

Yeah, I gave it a couple of tries (https://github.com/lool/irqbalance/tree/proc-interrupts-env-override https://github.com/lool/irqbalance/tree/test-proc-interrupts-parsing) but actually it's a bunch of files in /proc and /sys that participate in the overall state, and it would be quite more work to patch this all unfortunately.

Loïc Minier (lool)
Changed in irqbalance (Ubuntu Jammy):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Loïc Minier (lool)
Changed in irqbalance (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Nicolas Dechesne (ndec) wrote :

There is another SRU in progress for Jammy for this package (see https://bugs.launchpad.net/ubuntu/+source/irqbalance/+bug/2038573). We will wait until this SRU is finalized before moving forward.

Revision history for this message
Nicolas Dechesne (ndec) wrote :

Rebased on top of recent SRU which made it to -proposed.

Revision history for this message
Nicolas Dechesne (ndec) wrote :

Rebased on top of recent SRU which made it to -proposed.

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Hi Nicolas,

I think your proposed patch was not uploaded to the archive yet, right? May I suggest to add some DEP-3 headers [1] to your patch? For instance, adding the Origin, Reviewed-by and Applied-Upstream fields might be helpful.

[1] https://dep-team.pages.debian.net/deps/dep3/

Revision history for this message
Nicolas Dechesne (ndec) wrote :

Thanks Lucas for your review. I have updated the debdiff with a couple of DEP-3 headers.
It was not uploaded yet, I think Loic will do that soon.

Revision history for this message
Loïc Minier (lool) wrote :

Was off yesterday, uploaded today!

Revision history for this message
Jamie Nguyen (jamien) wrote :

Hello,

When should we expect to see this fix land in jammy-proposed?

Revision history for this message
Mitchell Dzurick (mitchdz) wrote :

Hi Jamie,

This package is in the unapproved queue at the moment.

https://launchpad.net/ubuntu/jammy/+queue?queue_state=1&queue_text=irqbalance

It is still waiting to be reviewed in our SRU process, apologies for the delay. I don't have a specific date for when this will be reviewed, but I'll ask in IRC for you.

Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Nicolas, or anyone else affected,

Accepted irqbalance into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/irqbalance/1.8.0-1ubuntu0.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in irqbalance (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Jacob Martin (jacobmartin) wrote :

I installed version 1.8.0-1ubuntu0.2 of irqbalance on an affected system (Nvidia IGX devkit, a Tegra Orin device) and rebooted. After the system finished rebooting, I rebooted again and observed that the system no longer stalls for 1m30s while shutting down.

I also manually ran `# irqbalance --oneshot --debug` and observed that the utility completes and exits as expected. This never exits when ran with the previous version of irqbalance installed (1.8.0-1ubuntu0.1), so it appears the issue was indeed resolved.

tags: added: verification-done-jammy
removed: verification-needed-jammy
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package irqbalance - 1.8.0-1ubuntu0.2

---------------
irqbalance (1.8.0-1ubuntu0.2) jammy; urgency=medium

  * d/p/lp2038300-parse_proc_interrupts-fix-parsing-interrupt-counts.patch
    Fix parsing of GPIO irqchip on Tegra platform (Backport) (LP: #2038300)

 -- Nicolas Dechesne <email address hidden> Mon, 16 Oct 2023 11:22:18 +0200

Changed in irqbalance (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Update Released

The verification of the Stable Release Update for irqbalance has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.