Please consider no more having irqbalance enabled by default (per image/use-case/TBD)

Bug #1833322 reported by ethanay
92
This bug affects 11 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Fix Released
Undecided
Unassigned
Ubuntu on IBM z Systems
Fix Released
Undecided
bugproxy
cloud-images
New
Undecided
Unassigned
irqbalance (Ubuntu)
Opinion
Undecided
Unassigned
ubuntu-meta (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

as per https://github.com/pop-os/default-settings/issues/60

Distribution (run cat /etc/os-release):

$ cat /etc/os-release
NAME="Pop!_OS"
VERSION="19.04"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Pop!_OS 19.04"
VERSION_ID="19.04"
HOME_URL="https://system76.com/pop"
SUPPORT_URL="http://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=disco
UBUNTU_CODENAME=disco

Related Application and/or Package Version (run apt policy $PACKAGE NAME):

$ apt policy irqbalance
irqbalance:
Installed: 1.5.0-3ubuntu1
Candidate: 1.5.0-3ubuntu1
Version table:
*** 1.5.0-3ubuntu1 500
500 http://us.archive.ubuntu.com/ubuntu disco/main amd64 Packages
100 /var/lib/dpkg/status

$ apt rdepends irqbalance
irqbalance
Reverse Depends:
Recommends: ubuntu-standard
gce-compute-image-packages

Issue/Bug Description:

as per konkor/cpufreq#48 and http://konkor.github.io/cpufreq/faq/#irqbalance-detected

irqbalance is technically not needed on desktop systems (supposedly it is mainly for servers), and may actually reduce performance and power savings. It appears to provide benefits only to server environments that have relatively-constant loading. If it is truly a server-oriented package, then it shouldn't be installed by default on a desktop/laptop system and shouldn't be included in desktop OS images.

Steps to reproduce (if you know):

This is potentially an issue with all default installs.

Expected behavior:

n/a

Other Notes:

I can safely remove it via "sudo apt purge irqbalance" without any apparent adverse side-effects. If someone is running a situation where they need it, then they always have the option of installing it from the repositories.

Related branches

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu-meta (Ubuntu):
status: New → Confirmed
Revision history for this message
Young Montana (yomota) wrote :

I am using Ubuntu 20.04.1 LTS 64 bit on an Intel mobile CPU and Gnome 3.36.8 (Kernel 5.4.0-58-generic).

irqbalance is still installed by default.

The frequently used Gnome Extension "cpufreq" shows a permanent warning that irqbalance is active.

If I uninstall irqbalance the warning is gone.

Since the warning is OK in "cpufreq" the underlying reason should be fixed IMHO (= removing irqbalance from default install on desktop images)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote (last edit ):

Hi,
this was overlooked for too long but came up in bug 2046470 again which made me see this for the first time.

I'd wish we'd have had that even a bit earlier e.g. to release it with
mantic and not half way through noble, but still now is the time to
still change the next LTS.

I needed to make up my mind on this to come to a conclusion and so I wrote a
summary mostly for myself, but also for others that I want to ack to the
decision as well as for anyone to later be able to understand what changed
and why.

I must admit that I'm slightly biased, having looked at it ages ago, even
before I was more active in Ubuntu development and already wondering if that
should be used by default.

And yes, some people had a stronger wish to get it out of the default.
So as already reported, many have already asked to remove it.

I'll try to break up my answers to be more easily referable.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.6 KiB)

# Referred Arguments

An argument that might not have been so strong more than a decade ago
but is much more today is power savings and that is an aspect that comes up
over and over.
It also had reports of conflicts with power saving [10] and e.g. dynamically
disabling/enabling cores which is much more a thing nowadays as long ago
this was only reliably working on mainframes anyway.

I don't buy the "games need 100%" as even games need their I/O to happen,
but OTOH irqbalance just doesn't help much nowadays either as the kernel learned
many more tricks to do well - like to name just one all the traffic aware
and potentially offloaded rps/xps [2]. And irqbalance is not mutually exclusive
with most of those technologies not with RSS [18] nor with kernel policies [15].

Some report about conflicting with their custom tweaking of IRQs [8][16].
It is actually a common conflict between irqbalance being smart [9] and other
things like a particular device firmware being smart leading to a conflict of
interest.
=> But TBH that is why it is removable for such rare cases.

On one hand it clearly has some impact and various cases of bad impacts by it
have come up as well for frame rates [11], stuttering [14] or even network
traffic [12].

But on the other hand, there have been reports and cases where a broken
irqbalance led to impacted high-performance network traffic [7], so it is
not that it is clearly always bad [13]. While we never know how outdated
any such source might be, it proves that it is most likely workload and
system dependent. Many documentations also sitll refer to it only older RH,
Arch [19], ... you'll find it everywhere.

It is an interesting case, and the workload dependency leads many discussions
to even be contradicting - in one case it saves cpu power in the other it makes
it worse. In one it helps traffic in the other is degrades it. That is all a
consqeuence of it being workload and system dependent.
This back and forther is perfectly encapsulated in this phornix thread [15].
Which quotes interesting other POVs like kernel solutions often being "driver
centric" optimizing throughput, but maybe not always the best as policy for
the full system as irqbalance pilicies and tunables are configurable.

An interim summary might be:
"""
It could cause rare issues or conflicts, especially on Desktop,
but might be still wanted on Servers especially those with a
high rate of I/O
"""

Which is interestingly quite close to the arguments floating around when it
was added more than a decade ago (see further below).

[2]: https://www.kernel.org/doc/html/latest/networking/scaling.html
[7]: https://bugs.launchpad.net/ubuntu/+source/irqbalance/+bug/2038573
[8]: https://groups.google.com/g/gce-discussion/c/Ns8hgOUW9GY
[9]: https://docs.xilinx.com/r/en-US/ug1523-x3522-user/Interrupt-Affinity
[10]: https://konkor.github.io/cpufreq/faq/#irqbalance-detected
[11]: https://askubuntu.com/questions/1067866/ubuntu-18-04-steam-games-frame-rate-drop
[12]: https://serverfault.com/questions/410928/irqbalance-on-linux-and-dropped-packets
[13]: https://bookofzeus.com/harden-ubuntu/server-setup/disable-irqbalance/
[14]: https://www.reddit.com/r/linux_gaming/comment...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

# Integration and maintenance

Despite some saying it is for the past only, it is regularly updated
and has multiple releases per year throughout all the time [4]. Those
updates flow well into Debian and Ubuntu - so it is not a classic "old
and outdated" case. And while not much changes in those updates, it means
it still learns like about thermal events in 1.9.1 or about isolcpus in 1.0.9.
I'm not saying it is super modern doing it all, but it gets updates.

Currently this is seeded in ubuntu-standard [1], which is what makes it
default installed everywhere. But it is intentionally only a recommends,
so the set of people that want to remove it can do so.

It was added a long time ago [3] back when multi-core was a rare thing
at least for Desktop systems. This was based on a discussion [5] and was
related to the kernel [6] actively delegating this to userspace. Debian
did a similar change a bit later [17] for the same reasons.
But again this was the time of single-core being common.

[1]: https://git.launchpad.net/~ubuntu-core-dev/ubuntu-seeds/+git/platform/tree/standard?h=noble#n19
[3]: https://git.launchpad.net/~ubuntu-core-dev/ubuntu-seeds/+git/platform/commit/?h=noble&id=dcd02266953547e11221979eb17eb740a76a62b5
[4]: https://github.com/Irqbalance/irqbalance/tags
[5]: https://lists.ubuntu.com/archives/ubuntu-devel/2010-January/029939.html
[6]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8b8e8c1bf7275eca859fe551dfa484134eaf013b
[17]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=577788

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

# Actions by Others

Times have changes, as mentioned above the kernel learned many new tricks.
More new I/O hardware virtual or physical appeared that tries to be smart
and thereby sometimes conflict with what irqbalance does.

Some are mostly based on the links referred above, the Debian disucssion
was more about it being harmful (or at least not helpful) in virtual
environments and hence removed from cloud images (we close in on workload
specific again).

Indeed many projects already removed it from the default
- https://github.com/pop-os/iso/pull/288
- https://github.com/ValveSoftware/Proton/issues/3243
- https://lists.debian.org/debian-cloud/2019/04/msg00040.html

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

# Summary

This discussion was seeminly easier to make the more dedicated to a singluar
use case you are - as then you have less "but what if" cases to consider.
That wide usage is great for Ubuntu but sometimes delays decisions.

List of reasons to remove it from the default dependencies:
- Seems to cause issues more often on Desktop environments
- cpufreq, thermald and similar struggle to save energy
- Impacts due to unepexcted throttling
- Conflicts with enabling/disabling threads/cores
- Problematic in virtual environments
- It is mostly an x86 thing but we pull it in everywhere
- It conflicts with manually fine tuned IRQ affinity e.g. in
  ultra low latency setups
- It is less useful on cpus with large and wide shared caches
  as well as in virtual environments without fix pinning

List of reasons to keep it in the set of default dependencies:
- Benefits seem mostly for large scale servers
- lacking irqbalance can be a performance degradation in some
  large scale high traffic cases

I think from all I've found - old and new - it seems it still has its purpose
in some scenarios, but the HW/SW world evolved and it is nowadays less often
useful and more often harmful than it was in the past.
On the other hand there is almost no clear cut "it is bad and that is why",
most issues were individual issues and special cases, nothing that would
apply to everyone.

And irqbalance still has is purpose, so we should surely keep it around.

In a perfect worlds this would have half a year of time or more and two people
to run all kinds of workloads on all kinds of HW to compare. But let us be
honest that will not happen and that would then also be not be worth the effort.
We'll have to decide with what we have.
Have the others that switched have more time to evaluate in depth, I do not
know. But usually once a significant amount of the ecosystems changed and you
lack better data it is better to also follow or common hints and optimizations
will no more apply due to being the one outlier in regard to behavior.

To me this seems to be a perfect case for a few special images/deployments
known to match the workload profile that needs this to enable it.
It is also more likely that a professional admin of such a large scale machine
(or cluster thereof) can make the opt-in decision and evaluation better than
expectint every user of Ubuntu to think about an opt-out.

---

Options IMHO:
A) Change it from an opt-out to an opt-in and remove the dependency
   from ubuntu-standard
B) Remove it from ubuntu-standard to get rid of it in Desktops and images
   used in virtual environments. But try to keep it in a place that is mostly
   used for bare metal which tend to be closer to the kind that benefits more
C) Do nothing, keep it as is

D) Any of the above, but let us not touch Noble more than half way through the
cycle, but do that early in 24.10 to have enough exposure before a release in
an LTS.

My gut feeling (and it can't be much more without much more time for much
deeper investigations) would be (A).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote (last edit ):

I subscribed a few people directly to get their input.

@Steve
I've subscribed you after trying to find, refer and summarize all of the past to allow you and anyone else to read into this in one go. I think I'll need your input as Architect and as participant of these discussions right from when they started 14 years ago.

@Phil/@John
Some past discussions, especially the backpedaling of Debian referred to virtual environments and/or large cloud providers. Is irqbalance anything you got asked to disable (or keep) for their environment?
No need to share names, but reasoning or data points would be helpful :-)

@Dimitri
Is there a more clear "this is what userspace should do in regard to this in 2024" form the kernel? I couldn#t find it, but maybe you know or know who'd know ...

@Sebastien
Since most problems reported have been around Desktops (to be fair, that could be an coincidence because that is where people do more experiments and have more diverse special cases). But I think it is fair to ask you if requests or discussion like the above have come up towards Desktop that are worth to refer here?

Maybe one of you has more details that help to make the decision more clear and easy.
Or a gut feeling that is even stronger than mine, strong enough even to pick one of the options?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

After all the history I was looking at where we are right now:
- irqbalance already is not in ubuntu-cloud-minimal images
- irqbalance is in normal cloud images and installed systems via the dep from ubuntu-server

Revision history for this message
Doug Smythies (dsmythies) wrote :

Thank you for your incredibly thorough analysis of this. Since finding this via bug 2046470, I have tried, without success, to create a test to show any difference in performance or power or whatever between irqbalance enabled/disabled on my Ubuntu 20.04 test server.

While my vote carries little weight here, I give it anyhow:

A) Change it from an opt-out to an opt-in and remove the dependency
   from ubuntu-standard

Mainly because, and from my own investigation, I agree with:

> To me this seems to be a perfect case for a few special images/deployments
> known to match the workload profile that needs this to enable it.
> It is also more likely that a professional admin of such a large scale machine
> (or cluster thereof) can make the opt-in decision and evaluation better than
> expecting every user of Ubuntu to think about an opt-out.

Revision history for this message
Steve Langasek (vorlon) wrote :

Hi Christian,

I see a lot of strong opinions being given, but aside from the "don't use it in KVM" guidance which appears to be based on GCE's engineering expertise, very little evidence that irqbalance is actually a problem.

I think it's true that in the default config, irqbalance can interfere with putting CPUs into higher C states to conserve power. However, I don't see any indication of quantitative analysis showing the impact.

Recent versions of irqbalance have a '--powerthresh' argument that can be used to tell irqbalance to rebalance across fewer cores when CPU load is low, to allow some of the cores to be put into a sleep state and conserve power. My own initial testing on my desktop shows that this gets used for all of about 10 seconds at a time every few hours, before the load increases and irqbalance wakes the core back up...

I would want any decision to remove irqbalance from the desktop to be based on evidence, not conjecture. At a minimum, I think what I would like to see is output from powertop showing both power consumption and CPU idle stats over a reasonable amount of time (10 minutes?), on a representative client machine, for a 2x3 matrix of configurations:

 - idle vs normal desktop load
 - irqbalance disabled vs irqbalance enabled with defaults vs irqbalance enabled with IRQBALANCE_ARGS=--powerthresh=1

System should be rebooted between each of the irqbalance configurations, as I'm not sure what does or doesn't persist in the CPU config after irqbalance exits.

I am specifically not going to try to rebut the various webpages referenced here, beyond saying that there's an awful lot of these pages pointing to one other as authoritative sources on irqbalance without there actually being evidence to back them up (and a heaping spoonful of misinformation / outdated information along the way). So if we're going to make a change, there should be due diligence to demonstrate a benefit, it should not be based on Internet hype.

Revision history for this message
ethanay (ethan-y-us) wrote : Re: [Bug 1833322] Re: Consider removing irqbalance from default install on desktop images
Download full text (10.0 KiB)

Does it seem correct to say that the general intention of irqbalance wrt to
system performance is to improve throughput (translating in some cases to a
more responsive system) at a cost of increased processing latency?

If so, then it should be considered and tuned generally with regards to
usage scenarios that consider latency vs throughput. Eg digital audio
workstations and gaming machines might disable it.

But as Steve says, we don't have any data on the tradeoffs. How much more
throughput/responsiveness for what cost in latency under what
configurations?

All I can find is a recommendation not to use it on CPUs with 2 or fewer
cores as the overhead is said to be too high (which acc to above would
translate to "unreasonable amount of latency for relatively little or even
no throughput gains"), but even then, are we talking physical or
logical/virtual cores?

It seems like the more cores a system has, the more trivial the overhead
from running irqbalance per performance/responsiveness gain. Is there a
threshold number of cores beyond which something like IRQ balance becomes
strongly recommended for general computing applications? But even then just
like power scaling I can imagine it might still add undesirable or even
critical latency in applications that are highly latency sensitive (eg when
milliseconds or fractions of milliseconds matter)

This website gave me some clarity on the theory and purpose:
https://www.baeldung.com/linux/irqbalance-modern-hardware

There is another dimension, one related to one of the reasons why Apple
became known as the "AV professional's workstation" for so long, is that
(apparently for fascinating reasons of a historical accident) the
multimedia system engineers gained enough influence in the company to allow
them to tune the default system configuration to prioritize latency and
then system responsiveness over throughput (and even some compromises in
system security) to allow for minimal system config in applications
requiring both low and consistent (eg low jitter) latency. As it turned
out, they got away with doing this for so many years in part because the
growing "AV professional wannabe" crowd who just used the system mostly for
general (rather than low latency sensitive) applications didn't really
notice or care about the hit to throughput or security vulnerability.
Noticeable in benchmarks, but not in real life.

My first point in saying this is that benchmarks don't necessarily tell us
what will give the greatest benefit to the greatest number of users with
minimal or no reconfiguration. Eg, who cares if it takes even 10% more
milliseconds to transcode an AV file or compile code (on same hardware
configred differently) if it means you could also run latency sensitive
apps at a consistent (low jitter) amd low latency without having to
reconfigure anything and maintaining a generally responsive system? People
often just walk away from that anyway (either physically, eg smoke or
coffee break, or figuratively, eg task switching, in which case a
responsive system would be a higher priority than crunching the numbers
slightly faster).

My second point is I think obsession over benchmarking risks losing ...

Revision history for this message
Launchpad Janitor (janitor) wrote : Re: Consider removing irqbalance from default install on desktop images

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in irqbalance (Ubuntu):
status: New → Confirmed
Revision history for this message
Mike Ferreira (mafoelffen) wrote (last edit ):

I said my initial piece and recommendation here:
https://bugs.launchpad.net/ubuntu/+source/irqbalance/+bug/2046470/comments/2

It carries through here... This was brought up as a recommendation in Launchpad (here in this bug report) back in 2019, In that bug report, I questioned why this had been ignored, and not discussed much since then. It didn't go away, and it was discussed as it should have been. I was embarrassed that it had been that way for 4 years.

Since then:

By then Debian had already removed it from being installed as a default. Ubuntu kept it. even after that bug report...

RedHat had removed it from being default installed.

PopOS had removed it from their default installed.

SUSE, is a special case, where they kept it for their lineup (which includes their Enterprise Server Lineup and desktops. I confirmed)... BUT then on page 16 of their Performance Analysis, Tuning and Tools Guide (https://documentation.suse.com/sbp/server-linux/pdf/SBP-performance-tuning_en.pdf), that chapter starts out with this quote:
>>> A correct IRQ configuration – above all in multi-core architecture and multi-thread
>>> applications– can have a profound impact on throughput and latency performance
...and further says that the first step to get there is to disable irqbalance (where they give the instructions to disable the service) and how to go through irq configuration from there.

Applications vendors, which we have in our repo's, such as Valve Steam and CpuFreq, currently recommend removing irqbalance, if installed.
RE:
https://github.com/ValveSoftware/Proton/issues/3243
http://konkor.github.io/cpufreq/faq/

Additional to the blog article linked to in the last comment above, I found this blog (https://blogs.oracle.com/linux/post/irqbalance-design-and-internals), that goes into how it makes decisions in load balancing and is best summed up in it's conclusion:
>>> This article described the internals of the irqbalance daemon. The information provided
>>> here can be used to debug and better understand load balance decisions taken by irqbalance.

The question I have is, if Ubuntu is Debian Branch, and we long ago went from having different kernels for desktop & server in ubuntu-base, but do have ubuntu-server packages and ubuntu-desktop packages, where things could be different, why is this still a broad sweep as a default install "for all"?

I think the above weighs in on having it as optional. But I am not in that that (final) decision.

I am happy that this is getting discussed properly now so that we can relook at this, and what it means to us today.

Revision history for this message
Doug Smythies (dsmythies) wrote :

Lots of good comments. I sort of agree with:

> So if we're going to make a change, there
> should be due diligence to demonstrate a
> benefit, it should not be based on
> Internet hype.

However, I would have said:

If irqbalance is to be included by default, then there should be due diligence to demonstrate a clear benefit.
Simplier is better, and every added thing can have issues, bug 2046470 being an example for irqbalance.

On my Ubuntu 20.04 test server (kernel 6.7-rc8) running a 24.04 server VM (with 4 vcpus) I ran 3 token passing ping pong pairs, monitoring power and idle states on the host with irqbalance enabled and disabled on both host and guest.
The results were:

irqbalance disabled:
pair 1: 4.3378 uSec/loop
pair 2: 4.4207 uSec/loop
pair 3: 4.5144 uSec/loop
Processor energy: 87,500 Joules.

irqbalance enabled:
pair 1: 4.5828 uSec/loop +5.6%
pair 2: 4.7084 uSec/loop +6.5%
pair 3: 4.7704 uSec/loop +5.7%
Processor energy: 92,252 Joules. +5.43%

The attached graph is processor power at 15 seconds per sample from 30 seconds before until some seconds after the test completes. The extra extra energy for the irqbalanced test is because the test took longer to complete.
I also have graphs for all idle states usage and above/below stats, none of which reveal anything.

Another test done was iperf3 between the guest and host forcing a small tcp window size. The test was run for 22 minutes.
The command:
iperf3 --interval 0 --bidir --window 1024 --time 1320 -c s19.smythies.com

irqbalance enabled:
412 MBytes sent
45.1 GBytes rec'd
Processor energy: 69,272 Joules.

irqbalance disabled:
413 MBytes sent, 0.24% improved
45.2 GBytes rec'd, 0.22% improved
Processor energy: 70,560 Joules. +1.86%

The related idle graphs don't reveal anything.

A third test was iperf3 between the guest and host using the default (big) tcp window size. The test was run for 22 minutes.
The command:
iperf3 --interval 0 --bidir --time 1320 -c s19.smythies.com

irqbalance enabled:
6.99 TBytes sent
2.10 TBytes rec'd
9.09 TBytes total
Processor energy: 77,888 Joules.

irqbalance disabled:
7.62 TBytes sent, 9.0% improved
1.62 TBytes rec'd, 22.9% worse
9.24 TBytes total, 1.65% improved
Processor energy: 80,166 Joules. +2.92%

The graphs (not attached) show the main differences are in idle state 0 usage.

Other notes:
Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
HWP enabled
intel_pstate CPU frequency driver
powersave governor

Revision history for this message
Paride Legovini (paride) wrote (last edit ):

Hi, adding a couple of extra pointers here (I'm the Debian irqbalance maintainer). This the Debian bug where the discussion on removing irqbalance from the kernel Recommends happened:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=926967

In Debian irqbalance is not installed anymore by default since mid-2019 (clearly reflected by popcon: https://qa.debian.org/popcon-graph.php?packages=irqbalance), and no bug was reported related to it being missing.

Back in the day I asked upstream their take on irqbalance usefulness with newer kernels, here is their reply:

https://github.com/Irqbalance/irqbalance/issues/151

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Steve,

> I see a lot of strong opinions ... I would want any decision to remove
> irqbalance from the desktop to be based on evidence, not conjecture.

I agree that there is plenty of opinion (often backing up each other with cyclic
links) and not much data. Hence my compilation of the history to make it
somehwat consumable.

I wasn't entirely sure on my own but I agree that we'd need data to back
up changes, thanks for empowering that branch of the decision tree.

Yet on the other hand, that most likely means not much will move quickly.
Which is fine, but also makes it unlikely to conclude before Noble freezes.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Ethanay
> All I can find is a recommendation not to use it on CPUs with 2 or fewer
> cores as the overhead is said to be too high

This isn't a real problem anyway, the service will stop immediately if only
running on one core - even if running on multiple cores with the same
cache (as the intended benefit is due to cache hotness by having all I/O
hitting the same cache).

> I can imagine it might still add undesirable or even critical latency in
> applications that are highly latency sensitive

I understand your line of thought, but it might even improve latency.
If there is no bottleneck on the cores assigned to handle an IRQ then
the improved cache hit rate will make even latency better.
And if there is a strong bottleneck, then some drivers without IRQbalance
would end up locked on one cpu - so again these might gain lower latency.
But I have no data on this either (just like no one seems to have on almost
any of this).

Just like others I'd personally more expect the drawback to be on a potential
lack of power saving.

> This website gave me some clarity on the theory and purpose:
> https://www.baeldung.com/linux/irqbalance-modern-hardware

Hah, didn't find this one yet - thank you!
But to me it only underlines the "it can help as much or even more often"
expectation.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Mike

> SUSE ... says that the first step to get there is to disable irqbalance

I've read the same, IMHO that is just "if you want to manually tune, disable
it" which does not imply that it is bad to have it. But this is how I read
it, I have not talked to the authors to get their underlaying reasoning.

> Applications vendors ... currently recommend removing irqbalance

The only one that does so AFAICS is cpufreq and everyone else just links
to their reasoning and follows. And even some statements there like
"If you are still running irqbalance, you are not getting the maximum
performance your system is capable of!" are hard to believe as a general
statement - especially without data across a wide variety of system types
and workload.
As we have seen as well in the references linked, irqbalance helps just as
much for "maximum performance" in many other cases.

> I found this blog (https://blogs.oracle.com/linux/post/irqbalance-design-and-internals)

Thanks, every extra background we find will only help (except for those
joining later to read more).

> The question I have is, if Ubuntu is Debian Branch, and we long ago went
> from having different kernels for desktop & server in ubuntu-base, but do
> have ubuntu-server packages and ubuntu-desktop packages, where things could
> be different, why is this still a broad sweep as a default install "for all"?

Because there was no well-funded conclusion like "it really is bad for
environment X" to remove it. You are right that there are no technical blockers
to make it e.g. kept in servers but no more the default in Desktop.
After all it is already dropped in cloud-images used in virtual environemnts as
it had a more clear reasoning and argument there.

And there are also cases where irqbalance missing caused performance impact
and bug reports like the already mentioned [1] (clearly high scale server
though)

> I am happy that this is getting discussed properly now so that we can
> relook at this, and what it means to us today.

Ack, that is why I tried to compile all I've found into one place.

[1]: https://bugs.launchpad.net/ubuntu/+source/irqbalance/+bug/2038573

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Dough

> If irqbalance is to be included by default, then there should be due
> diligence to demonstrate a clear benefit.

You are right that we should have that as well.
But this would be even more ture if this would be about "making it the default
when it was not before".
Right now (purely opinion) the lack of data can IMHO neither be used to keep
it nor to remove it - which sadly locks this up a bit.

> The results were:

I want to thank you a lot, this won't be enough but it is a masterpiece
demonstration of dedicating time to start providing such data.
Thank you.

I do not know the ping pong test, but on iperf, I think that is in the noise
range as far as I remember. If you'd just re-run that as-is what is the delta
on your test box?

Hoping that this will be extended by more contributing different workloads
on different systems let me ask, what kind of system (cpu, size, nodes, ...)
was that. I know you are good at writing up things, you might set the standard
how others might report to this :-)

Your results show no change or minimal degradation while at the same time losing
a bit of power. Have you also had a chance to try the powerthresh argument
that Steve mentioned above?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Paride

> Back in the day I asked upstream their take on irqbalance usefulness with
> newer kernels, here is their reply:
> https://github.com/Irqbalance/irqbalance/issues/151

Thanks for this and the other extra pointers.
The Debian bug was referenced before, AFAIC it is mostly around
a) the kernel got smarter in many cases (true)
b) bad in virtual environments (we already removed it from those)

And in that discussion the upstream comments (it is good to see that
they are still convinced of their code) revolved around:
c) There should be no conflict with running irqbalance (with the new kernel)
d) The kernel policy is driver centric (irqbalance has a full picture)

Both - as I read them - are more arguments to keep it than to remove.
But as all other, not with enough data to make it a clear yes/no.

As I said much earlier in this case, I feel this is system and workload
dependent and hence there will never be a clear generic yes/no.
The best we can achieve is finding sets (like images used in virtual
environments - or as suggested desktop systems) and drop it being the
default there.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I want to try to avoid that this becomes too stale, so I wondered
what we can do from here. Two things came to my mind.

On one hand I will try to use some indirect relations to pull in some
HW manufacturer experts. They often have large performance teams tracking
things like that against different workloads.

And on the other hand, due to the request seemingly to close in on
"please consider not making it the default on desktop" (server is more likely
to have these large scaling workloads that are more likely to benefit) we need
to pull in someone from Desktop a bit more.
I'll do a few direct pings for that as well to ensure to get their voice too.

Doing so now ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Pings done, in a perfect world (if all reply) that would cover more than we ever need, but then there is 0% guarantee they even have time or care about this at the moment :-)

If anyone has connections as well, please ask them to participate too.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → bugproxy (bugproxy)
tags: added: reverse-proxy-bugzilla
bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-204586 severity-medium targetmilestone-inin2404
Revision history for this message
Loïc Minier (lool) wrote :

Just saw a mention of this bug, and I wanted to provide another datapoint: I recently sponsored a SRU for an irqbalance bugfix (LP #2038300), it was for an edge server platform (NVIDIA IGX Orin). What I noticed was that the code was inherently racy and hard to validate with unit tests because it's trying to read from multiple kernel data structures in virtual filesystems and then take action. I do believe its function would better be provided in the kernel itself.

Revision history for this message
Doug Smythies (dsmythies) wrote :

Hi Christian,

Thank you for your reply to my post.

> I do not know the ping pong test,

A simple token passing ring, that is useful for getting the system to utilize shallow idles states. Otherwise it can be difficult to get to such shallow states on my test system, without them being timer based.
While not relevant to this thread, the test presents a challenge for the TEO (Timer Events Orientated) idle governor, and the menu governor should perform better.

> but on iperf, I think that is in the noise
> range as far as I remember.

Agreed. I am just searching for good example type tests is all.

> If you'd just re-run that as-is what is the delta
> on your test box?

Oh. it's repeatable. I just haven't got to re-testing it yet with 24.04 as my host hardware.

> let me ask, what kind of system (cpu, size, nodes, ...)
> was that.

Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
6 cores, 2 threads per core, 12 CPUs.
HWP enabled (A.K.A. Intel Speed Shift)
intel_pstate CPU frequency driver
powersave governor
teo idle governor (but menu used below)
No throttling involved, ever.

> Your results show no change or minimal
> degradation while at the same time losing
> a bit of power. Have you also had a chance
> to try the powerthresh argument
> that Steve mentioned above?

I was just searching for a good test, and since that post I did find a really good one (not reported here). However, that was with 20.04 which has an old version of irqbalance. So I made my system dual boot adding 24.04. That same test is now not good at all for showing any differences.
And yes, also tested with the powerthresh argument.
Just for completeness, the attached graph shows processor package power and the only other graph that had some slight signal above the noise, the "idle state 1 was to deep" graph.
The test: 6 ping pong pairs, with almost no work done at each stop, 300 million loops. About 27 minutes.
Legend:
irqb-menu-disable: irqbalance disabled, menu governor.
irqb-menu-enable-1: irqbalance enabled with powerthresh=1, menu governor.
irqb-menu-enable: irqbalance enabled, menu governor.

Power: see graph, same for all.
irqbalance disabled: 5.1854 uSec/loop
irqbalance enabled powerthresh=1: 5.1966 uSec/loop
irqbalance enabled: 5.1817 uSec/loop

Revision history for this message
ethanay (ethan-y-us) wrote : Re: [Bug 1833322] Re: Consider removing irqbalance from default install on desktop images
Download full text (5.1 KiB)

Hi Christian,

Thank you. Yes I was not arguing strictly against irqbalance, just trying
to ascertain some discussion parameters as well as parameters for data
collection.

I have not yet seen a coherent philosophy on what it means to "optimize
performance" with default settings that serve the greatest capacity of
server or desktop scenarios. In my humble opinion, data collection is
useless without this framework of understanding what it is we are trying to
achieve and why in terms of system performance. To me this is the deeper
unresolved issue, perhaps.

I fear that systems are currently optimized by default for throughput. For
users, responsiveness (which can include but is not limited to throughput)
and latency may be more important psychologically (there is an analogy to
this in AV production: we can actually get by with fairly poor video
quality--which consumes the most bandwidth and processor power--if audio
quality remains adequate; ie, audio quality has a disproportionately high
impact on psychology compared to video, especially per unit of data or
processing power allocated). And power saving is important in global terms,
as even small gains multiplied over hundreds or thousands of deployments
can have a significant impact, even if the client or operator doesn't
notice much.

ethan

On Wed, Jan 10, 2024 at 4:35 AM Christian Ehrhardt  <
<email address hidden>> wrote:

> Hi Ethanay
> > All I can find is a recommendation not to use it on CPUs with 2 or fewer
> > cores as the overhead is said to be too high
>
> This isn't a real problem anyway, the service will stop immediately if only
> running on one core - even if running on multiple cores with the same
> cache (as the intended benefit is due to cache hotness by having all I/O
> hitting the same cache).
>
> > I can imagine it might still add undesirable or even critical latency in
> > applications that are highly latency sensitive
>
> I understand your line of thought, but it might even improve latency.
> If there is no bottleneck on the cores assigned to handle an IRQ then
> the improved cache hit rate will make even latency better.
> And if there is a strong bottleneck, then some drivers without IRQbalance
> would end up locked on one cpu - so again these might gain lower latency.
> But I have no data on this either (just like no one seems to have on almost
> any of this).
>
> Just like others I'd personally more expect the drawback to be on a
> potential
> lack of power saving.
>
> > This website gave me some clarity on the theory and purpose:
> > https://www.baeldung.com/linux/irqbalance-modern-hardware
>
> Hah, didn't find this one yet - thank you!
> But to me it only underlines the "it can help as much or even more often"
> expectation.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1833322
>
> Title:
> Consider removing irqbalance from default install on desktop images
>
> Status in irqbalance package in Ubuntu:
> Confirmed
> Status in ubuntu-meta package in Ubuntu:
> Confirmed
>
> Bug description:
> as per https://github.com/pop-os/default-settings/issues/60
>
> Distribution (run cat...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: Consider removing irqbalance from default install on desktop images
Download full text (4.1 KiB)

Hi Etanay,

I realize I maybe wrote too much :-/
So I start with a TL;DR:
AFAICS you are right in all you say, but I think there can not be "one right answer" anyway. Hence I'm trying to leave all parties their freedom of defining what is important to them and try to learn from them what impact irqbalance has to that.

> Yes I was not arguing strictly against irqbalance, just trying
> to ascertain some discussion parameters as well as parameters for data
> collection.

Yeah, I see that and didn't intend to rebut your statements either.
Just push them a bit into potential context and POV of others.

> I have not yet seen a coherent philosophy on what it means to "optimize
> performance" with default settings that serve the greatest capacity of
> server or desktop scenarios.

That is true, but the reason for that is that you can only optimize for
something like a workload or particular HW.

The defaults are usually trying to be not too crappy for any possible
thing that might happen on e.g. Ubuntu which is quite a scope.

> In my humble opinion, data collection is useless without this
> framework of understanding what it is we are trying to achieve
> and why in terms of system performance. To me this is the deeper
> unresolved issue, perhaps.

I can see your point and would not even argue against. But this is
(this is opinion and a bit of experience, not scientific proven
truth) only the problem if we'd try to solve the singular global
and always valid "is irqbalance good or bad" question.

Thinking about it I think I'm even of the same opinion than you,
but instead of standardizing excatly what we are trying to achieve
(which to me feels like selecting a workload or HW as optimization
target) I was trying to reach out to as many groups as possible
so we can see what HW/workloads are important to them and how
irqbalance might help or interfere with that.

A bit like the old case where some clouds brought it up that it is
conflicting in virtio-net on their substrate and to be disabled
by default there (see Debian and also some Ubuntu cloud images).

I have personally no hope in reaching a general "this is good / bad"
without considering it per workload or HW environment.

Hence my hope is that if we manage to get this variety of preferences
of different parties and only then the impact of irqbalance to that
we can make compartmentalized decisions.
For example as some suggested, making it no more the default in
Desktop, but keeping it in other cases.

And this is just me trying to be helpful and drive this from being
a dormant case to something useful, I do not pretend to have the
masterplan or the solution yet :-)

> I fear that systems are currently optimized by default for throughput. For
> users, responsiveness (which can include but is not limited to throughput)
> and latency may be more important psychologically

Can I just say yes here, you go into lengths explaining (thanks) but I
already agreed here :-)

Yet - as true as that is - it is true for a set of workloads and hardware,
but not for all that Ubuntu can be (as I outlined above neither decision
could be true for all)

> And power saving is important in global terms, as even small...

Read more...

Revision history for this message
ethanay (ethan-y-us) wrote : Re: [Bug 1833322] Re: Consider removing irqbalance from default install on desktop images
Download full text (8.4 KiB)

Hi Christian,

Thank you, yes I don't disagree with anything you said. There can be no
"one size fits all" and customizing performance tuning will always be
important but I will argue
1. There can be a "one size fits most" at least for desktop client
environments ("general optimization")
2. It may be surprising what "general optimization" entails given the
general lack of consideration to the psychological experience of the user.
Apple discovered this as a historical accident a few decades ago. I am no
fan of them, though, because they don't allow that customization. My wife
is dyslexic and uses a Mac for work that does not allow her to install and
use OpenDyslexic fonts in the OS because Apple has already "determined what
is best."

To ground the 2nd point above, I would give a hypothetical: trading 25ms of
latency/jitter for a 10% gain in throughput might seem like a no-brainer
from a benchmarking perspective. But when user psychology is factored in as
well as allowing for adequate default performance for the widest use cases
available, the tradeoff quickly becomes unacceptable. The "relatively
large" 10% throughput has very little relevance outside of benchmarking
whereas the 25ms of latency/jitter can make or break entire workflows and
usage scenarios from a user perspective covering a broad set of scenarios.
The only danger there is that people will compare "out of the box benchmark
performance" and say "this system is slower than that system!"

But I agree, now it's time for more discussion and input (including data)
from others. I'm glad that this discussion is occurring! I don't think I
have anything more to offer at this point.

ethan

On Thu, Jan 11, 2024 at 11:20 PM Christian Ehrhardt  <
<email address hidden>> wrote:

> Hi Etanay,
>
> I realize I maybe wrote too much :-/
> So I start with a TL;DR:
> AFAICS you are right in all you say, but I think there can not be "one
> right answer" anyway. Hence I'm trying to leave all parties their freedom
> of defining what is important to them and try to learn from them what
> impact irqbalance has to that.
>
>
> > Yes I was not arguing strictly against irqbalance, just trying
> > to ascertain some discussion parameters as well as parameters for data
> > collection.
>
> Yeah, I see that and didn't intend to rebut your statements either.
> Just push them a bit into potential context and POV of others.
>
>
> > I have not yet seen a coherent philosophy on what it means to "optimize
> > performance" with default settings that serve the greatest capacity of
> > server or desktop scenarios.
>
> That is true, but the reason for that is that you can only optimize for
> something like a workload or particular HW.
>
> The defaults are usually trying to be not too crappy for any possible
> thing that might happen on e.g. Ubuntu which is quite a scope.
>
> > In my humble opinion, data collection is useless without this
> > framework of understanding what it is we are trying to achieve
> > and why in terms of system performance. To me this is the deeper
> > unresolved issue, perhaps.
>
> I can see your point and would not even argue against. But this is
> (this is opinion and a bit of experience, no...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2024-01-15 06:46 EDT-------
Just a statement from s390x/IBM Z. For our platform irqbalance makes no sense as our interrupt handling works differently. So we do not need it.

Revision history for this message
Sebastien Bacher (seb128) wrote : Re: Consider removing irqbalance from default install on desktop images

Speaking from a Desktop perspective, it's difficult to have a strong opinion without data to backup the decision but it does feel like that in the light of what other distributions/upstream are doing we should reverse the default and go with option A and not have it by default but an opt-in instead.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Since the discussion is no more only covering Desktop I updated the title (thanks Seb128 for suggesting)

summary: - Consider removing irqbalance from default install on desktop images
+ Please consider no more having irqbalance enabled by default (per
+ image/use-case/TBD)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI, multiple parties and people promised me more input, but so far none has arrived over the last weeks.

Revision history for this message
Nicolas Dechesne (ndec) wrote :

Thanks for the ping.. I have had no feedback from NVIDIA (DGX, Bluefield, Tegra), and I have informed them that from now on, we consider that 'no feedback' meant they are ok with the change.

Revision history for this message
Henry Wertz (hwertz10) wrote (last edit ):

Just for perspective on this, I've used Linux since about 1993 (originally Slackware, then Gentoo, then Ubuntu) and recall manually adding irqtune to my system in the distant past.

When irqtune was originally developed, it was common to run XT-PIC, all interrupts were went to CPU 0, period. When one turned on IO-APIC back then, interrupts still went to CPU 0 by default but could be rerouted. This was largely to be able to hit gigabit+ speeds on systems of the time.

Now? CPUs are faster, memory is faster, and the interrupt handlers are much more efficient than in the kernel of, say, 15 or 20 years ago. If you disable irqtune, you can observe in /proc/interrupts that various device interrupts are still sent to CPUs other than CPU0, they just don't go ping-ponging around between all of them like they do with irqtune. A big difference compared to the early days, for example for ethernet a lot of the work that was done in the interrupt handler back then, the interrupt handler now does the bare minimum and does the rest of the work with a kernel thread (same for wifi, which tends to be a bit of an interrupt and CPU hog.. along with SCSI/SAS/SATA/etc., NVMe, various video drivers, and I'm sure a bunch of other drivers... I'm sure there may still be some that do everything in the interrupt handler but best practice has been to do work in a worker thread for a long time and most drivers do). Meaning the actual interrupts take much less time to run now than they did then. The total CPU time of interrupt + worker thread work could still add up to more than 1 core can handle if you had, say, 10gbps ethernet (or 1gbps ethernet or wifi on a slower CPU). But the kernel threads can be scheduled to any CPU core just like any other thread even if the interrupts are tied to one CPU..

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

Now? CPUs are faster; if you disable irqtune, you can observe in /proc/interrupts that various device interrupts are still sent to CPUs other than CPU0, they just don't go ping-ponging around between all of them like they do with irqtune. A big difference compared to the early days, for example for ethernet a lot of the work that was done in the interrupt handler back then, the interrupt handler now does the bare minimum and does the rest of the work with a kernel thread (same for wifi, which tends to be a bit of an interrupt and CPU hog). Meaning the actual interrupts take much less time to run now than they did then. The total CPU time could still add up to more than 1 core can handle if you had, say, 10gbps ethernet. But the kernel threads can be scheduled to any CPU core just like any other thread even if the interrupts are tied to one CPU.

Revision history for this message
John Chittum (jchittum) wrote :

Proxying a few comments I've heard from cloud partners about uses:

There are some big companies, particularly in the streaming media and encoding business heavily using irqbalance. _however_ our consideration is about irqbalance enabled by default. They do heavy tuning, not running stock values. For those customers, it'll primarily be about workflow changes. install a package, run an extra line enabling irqbalance, etc. I, personally, don't see that as a blocker for making the change in 24.04. Those types of companies won't take bleeding edge, and will likely be going through a testing and upgrade effort that takes months, not auto-rolling to "ubuntu:latest." Least, i sure hope not :)

I'll take some followups again with partners to see if any individuals can comment on the public bug.

Revision history for this message
Frank Heimes (fheimes) wrote :

[Please ignore comment#35 - this was caused by a BZ-to-LP-bridge issue ...]

Revision history for this message
Christian Ehrhardt  (paelzer) wrote (last edit ):

Hey Henry, thanks for chiming in and I agree in general that tech moved on.
Myself and others said similar before, thanks for adding more details and voices - that is what such a discussion is about.

> they just don't go ping-ponging around between

In particular on this aspect, so much has happened with fast devices often not only "not being bottle-necked" but even I/O interaction routing smartly, I mentioned for example rps/xps on here before.

Still, there are even today a few workloads - usually high utilization large scale loads that benefit.
Thanks @John for carrying a few of them forward to this bug!

But the more I read, the more people chime in, ... the more one pattern seems to crystallize (for me).
I'll try to summarize my gut-feeling so far... (which is my opinion so far, not more):
"""
While it seems a few high intensity workloads still can benefit, those are of the kind that are usually hand-optimized and could easily pull-in irqbalance if needed.

On the other hand the majority of workloads do not care either way - at least not in an easily provable way.

And furthermore most of the need to have it in the past has been replaced by newer I/O architectures.

Finally there also have been some cases that suffered from irqbalance being enabled. Those cases in particular seem to be those of end-users, often Desktop end users that might not always tune their system intensely.

For consistency between Server and Desktop I'd prefer to change it in both in the same way, while the cases still benefiting all where server'ish there hasn't been a case that would need it by default.

Overall that makes me think that we could indeed change it to not be enabled by default anymore in the upcoming Noble release.
"""

I know that Steve (@vorlon) wanted to comment on this as well, maybe we have sufficient statements, opinions and at least a bit of data so far to have a decision for Noble before Feature freeze?

Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

Sorry for the late feedback, but sharing here:

AWS docs regarding best practices regarding cpu-starvation [1] do not recommend disabling the irqbalance service. Quoting the doc:

> "Note: we do not recommend disabling irqbalance service. ENA driver doesn’t provide affinity hints, and if device reset happens while irqbalance is disabled, this might cause undesirable IRQ distribution with multiple IRQs landing on the same CPU core."

Other customers that have hit issues with irqbalance were running very specific workloads and were aware of the need to turn it off, so we would prefer to keep irqbalance.

[1] https://github.com/amzn/amzn-drivers/blob/master/kernel/linux/ena/ENA_Linux_Best_Practices.rst#cpu-starvation

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Interesting, that is more towards irqbalance than I heard so far.
thanks Fabio!

So we might end up needing to go like "Generally disabled except this list of places [...] where it stays enabled".

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

While there was sadly neither enough time not enough resources to do all the deep dive analysis that could have been done, we succeeded by reaching out to many more parties and got their input as well. Thank you all!

Since Noble feature freeze is coming we need to make a call either way.
I proposed the underlying seed change [1].
And even once accepted that has to be followed by an update to ubuntu-meta.
Furthermore we'd have more follow up, like enabling it in special cases like the AWS images for the reasons Fabio mentioned.

Of course this is just a proposal. There are many other options left, from not changing anything to more subtle counters to my proposal like only doing so in 24.10 to give things more time, to holding back until someone found time/resource to gather more data.

But for now, I feel "Not enabling it by default, but enabling selectively where identified to be wanted" seems to be the better choice - and that is what I proposed.

[1]: https://code.launchpad.net/~paelzer/ubuntu-seeds/+git/platform/+merge/460904

Revision history for this message
ethanay (ethan-y-us) wrote : Re: [Bug 1833322] Re: Please consider no more having irqbalance enabled by default (per image/use-case/TBD)
Download full text (3.8 KiB)

My personal thoughts are that the proposal is nothing if not carefully
considered...! Lots of great discussion and input. There is plenty of
opportunity for people to provide feedback on whether or how the change
impacts them in ways we were unable to foresee.

On Tue, Feb 20, 2024 at 11:31 PM Christian Ehrhardt  <
<email address hidden>> wrote:

> While there was sadly neither enough time not enough resources to do all
> the deep dive analysis that could have been done, we succeeded by
> reaching out to many more parties and got their input as well. Thank you
> all!
>
> Since Noble feature freeze is coming we need to make a call either way.
> I proposed the underlying seed change [1].
> And even once accepted that has to be followed by an update to ubuntu-meta.
> Furthermore we'd have more follow up, like enabling it in special cases
> like the AWS images for the reasons Fabio mentioned.
>
> Of course this is just a proposal. There are many other options left,
> from not changing anything to more subtle counters to my proposal like
> only doing so in 24.10 to give things more time, to holding back until
> someone found time/resource to gather more data.
>
> But for now, I feel "Not enabling it by default, but enabling
> selectively where identified to be wanted" seems to be the better choice
> - and that is what I proposed.
>
> [1]: https://code.launchpad.net/~paelzer/ubuntu-
> seeds/+git/platform/+merge/460904
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1833322
>
> Title:
> Please consider no more having irqbalance enabled by default (per
> image/use-case/TBD)
>
> Status in Ubuntu on IBM z Systems:
> Confirmed
> Status in irqbalance package in Ubuntu:
> Confirmed
> Status in ubuntu-meta package in Ubuntu:
> Confirmed
>
> Bug description:
> as per https://github.com/pop-os/default-settings/issues/60
>
> Distribution (run cat /etc/os-release):
>
> $ cat /etc/os-release
> NAME="Pop!_OS"
> VERSION="19.04"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Pop!_OS 19.04"
> VERSION_ID="19.04"
> HOME_URL="https://system76.com/pop"
> SUPPORT_URL="http://support.system76.com"
> BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
> PRIVACY_POLICY_URL="https://system76.com/privacy"
> VERSION_CODENAME=disco
> UBUNTU_CODENAME=disco
>
> Related Application and/or Package Version (run apt policy $PACKAGE
> NAME):
>
> $ apt policy irqbalance
> irqbalance:
> Installed: 1.5.0-3ubuntu1
> Candidate: 1.5.0-3ubuntu1
> Version table:
> *** 1.5.0-3ubuntu1 500
> 500 http://us.archive.ubuntu.com/ubuntu disco/main amd64 Packages
> 100 /var/lib/dpkg/status
>
> $ apt rdepends irqbalance
> irqbalance
> Reverse Depends:
> Recommends: ubuntu-standard
> gce-compute-image-packages
>
> Issue/Bug Description:
>
> as per konkor/cpufreq#48 and
> http://konkor.github.io/cpufreq/faq/#irqbalance-detected
>
> irqbalance is technically not needed on desktop systems (supposedly it
> is mainly for servers), and may actually reduce performance and power
> savings. It appears to provide benefits only to server environments
> ...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Steve was so kind reviewing and approving my proposal.
Doing that now is also helpful as it should make sure it still has quite some exposure and thereby chances for people to report issues (vs if we'd land it much later like after beta freeze).

Changes will:
- change the seeds in regard to irqbalance, but no change to irqbalance (the package)
- need an update of ubuntu-meta
- IMHO we also want a release notes entry.
- CPC might consider re-enabling it as image customization for some as shown in comment #39

I'm adjusting the bug tasks and state accordingly.

Changed in ubuntu-release-notes:
status: New → In Progress
Changed in ubuntu-z-systems:
status: Confirmed → Opinion
Changed in irqbalance (Ubuntu):
status: Confirmed → Opinion
Changed in ubuntu-meta (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: Seed change landed

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: updated ubuntu-meta, now in noble-proposed as version 1.532

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ubuntu-meta - 1.532

---------------
ubuntu-meta (1.532) noble; urgency=medium

  * Refreshed dependencies
  * Removed irqbalance from standard-recommends (LP: #1833322)

 -- Christian Ehrhardt <email address hidden> Thu, 22 Feb 2024 12:37:15 +0100

Changed in ubuntu-meta (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 1833322] Re: Consider removing irqbalance from default install on desktop images

Belated response, but just for the record, Paride's recounting of upstream's
position in the context of the Debian decision was definitive for me:

On Wed, Jan 10, 2024 at 11:47:56AM -0000, Paride Legovini wrote:
> Back in the day I asked upstream their take on irqbalance usefulness
> with newer kernels, here is their reply:

In effect what this says is that: irqbalance is still useful, but unless
the admin configures it, the policy it provides is not a discernable
improvement over the in-kernel default policy.

Therefore I think it is the right path forward to unseed this and let users
install it in situations where they want to configure it.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I've added a section to the release notes summing this up and linking back here and to some of the past links.

Changed in ubuntu-release-notes:
status: In Progress → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Opinion → Fix Released
Revision history for this message
David Myers (demyers) wrote :

Since I saw that irqbalance was not going to be installed by default in Noble I removed it from my Raspberry Pi 5 running Mantic, and as a result the power button stopped working. The error is:

kernel: gpio-keys pwr_button: Unable to get irq number for GPIO 0, error -6

Reinstalling irqbalance made the power button work again.

Revision history for this message
Jeremy Bícha (jbicha) wrote :

David, please open a new bug against ubuntu-meta about that issue.

Revision history for this message
David Myers (demyers) wrote :

I filed Bug #2057822: "Removing irqbalance disables power button on Raspberry Pi 5" (https://bugs.launchpad.net/ubuntu/+source/ubuntu-meta/+bug/2057822).

Thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.