irqbalance degrades network performance on HP M800 server cartridge

Bug #1452875 reported by Manoj Iyer
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
irqbalance (Ubuntu)
Won't Fix
High
Manoj Iyer
Trusty
Fix Released
High
Manoj Iyer

Bug Description

On HP M800 server cartridge we want to prevent irqbalance from running because it causes degradation in network throughput.

[Development Fix]

Intentionally not done; see below.

[Stable Fix]

Patch to exit irqbalance if this particular hardware is detected.

[ SRU EXCEPTION REQUEST ]
A fix to this bug is limited in scope to 14.04 (Trusty) release. TI does not plan to support the Keystone II based Slayton platform beyond 14.04. TI has not upstreamed any kernel patches for this platform, instead, canonical is maintaining a 3.13 based linux-keystone flavor kernel to support Slayton. TI would be required to upstream all patches related to keystone II and Slayton if they need to extend support for their platform past 14.04. Therefore, the scope of this patch would also be limited to 14.04, and we request an exception from the SRU team to accept this patch in 14.04.

SRU TEMPLATE
============
[Impact]
* With irqbalance running on Slayton (HP M800 server) it degrades network performance.

[Testcase]
* To recreate this network performance degradation you need to execute the SRIO testcases provided by TI.
* After applying the patch run irqbalance command and it should emit a message to /var/log/syslog: May 7 19:56:13 ms10-18n3-slayton ./irqbalance: Balancing is ineffective on HP ProLiant m800 Server Cartridge. and exit gracefully.

[Regression Potential]
* None, this change is limited to HP M800 server cartridge

Manoj Iyer (manjo)
Changed in irqbalance (Ubuntu):
assignee: nobody → Manoj Iyer (manjo)
importance: Undecided → High
Manoj Iyer (manjo)
description: updated
Revision history for this message
Manoj Iyer (manjo) wrote :

Please accept this patch for SRU, patch disables irqbalance from running on HP M800 server cartridge.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "patch to disable irqbalance on HP M800 server cartridge" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Robie Basak (racb) wrote :

Hi Manoj,

Please can you contact the upstream irqbalance project and see how they think this should be implemented? We don't want to maintain this kind of change in Ubuntu indefinitely (it's a burden for the server team as well as poor for the community) so I'd prefer to be pulling in a newer upstream release with this issue fixed instead.

tags: added: needs-upstream-report
Changed in irqbalance (Ubuntu):
status: New → Triaged
summary: - Disable irqbalance on HP M800 server cartridge
+ irqbalance degrades network performance on HP M800 server cartridge
Revision history for this message
Robie Basak (racb) wrote :

We talked about the general approach on IRC and Manoj will update the bug. We'll get the general approach agreed first. But while I'm looking at the patch, here are some review comments:

+enum _bool {false = 0, true = 1};
+typedef enum _bool Bool;

Please don't do this unless it's convention in the existing source. Use an int instead as this is standard C convention unless there's some reason to do otherwise.

+ char *platform = "HP ProLiant m800 Server Cartridge";
+ char *sys_file = "/proc/device-tree/model";

These should be declared const.

+ while (fp != NULL && fread(tmp, 1, sizeof(tmp), fp) != 0) {

Won't this go forever if there is some kind of unexpected error? It would be clearer to just return immediately on error - once for the return value of fopen and separately for the fread. I also don't think there's any need to run fread in a loop - if it fails the first time, then just treat the file as unreadable (and presumably fail the test negative). Or, if the /proc/device-tree/model file might be bigger than the 256 bytes you've allocated in tmp, then the approach needs to be different from just strstr().

+ "Balancing is ineffective on %s.\n", platform);

Please explain in the error message that this is the reason that irqbalance is exiting - otherwise it could be confused to a sysadmin wondering why irqbalance is exiting despite this particular message. Eg. "Balancing is ineffective on %s: exiting.\n"

+ if (disable_on_platform())

As this code will never be generic, can you name this disable_on_hp_m800 or similar please, so it's clearer to someone reading the code?

Finally, please add "Forwarded: not-needed" to the dep3 header together with an explanation of why forwarding is not needed in the description.

Manoj Iyer (manjo)
description: updated
Revision history for this message
Manoj Iyer (manjo) wrote :

I have incorporated all the review comments and produced a new patch. Please review this patch and accept for trusty.

Robie Basak (racb)
Changed in irqbalance (Ubuntu Trusty):
status: New → Won't Fix
status: Won't Fix → Triaged
Changed in irqbalance (Ubuntu):
status: Triaged → Won't Fix
Changed in irqbalance (Ubuntu Trusty):
importance: Undecided → High
assignee: nobody → Manoj Iyer (manjo)
description: updated
Revision history for this message
Manoj Iyer (manjo) wrote :

Incorporated review comments call fclose() when fread() fails.

Revision history for this message
Robie Basak (racb) wrote :

Uploaded, with one minor change agreed with Manoj on IRC: on not finding /proc/device-tree/model, no warning is printed, as this is the common case on Intel (and I want to avoid risking alarming users via eg. logcheck).

I'll leave it to the SRU team to decide on whether it's OK to fix Trusty without fixing Wily, but Manoj's justification sounds OK to me (that the vendor has essentially EOL'd it beyond Trusty).

As well as checking on the affected hardware, please make sure that Intel hasn't regressed during SRU verification (eg. just check that irqbalance is running and functioning on Intel maybe?).

Changed in irqbalance (Ubuntu Trusty):
status: Triaged → In Progress
Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello Manoj, or anyone else affected,

Accepted irqbalance into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/irqbalance/1.0.6-2ubuntu0.14.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in irqbalance (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Manoj Iyer (manjo) wrote :

ubuntu@ms10-18n3-slayton:~$ apt-cache policy irqbalance
irqbalance:
  Installed: 1.0.6-2ubuntu0.14.04.2
  Candidate: 1.0.6-2ubuntu0.14.04.2
  Version table:
 *** 1.0.6-2ubuntu0.14.04.2 0
        500 http://ports.ubuntu.com/ubuntu-ports/ trusty-proposed/main armhf Packages
        100 /var/lib/dpkg/status
     1.0.6-2ubuntu0.14.04.1 0
        500 http://ports.ubuntu.com/ubuntu-ports/ trusty-updates/main armhf Packages
     1.0.6-2 0
        500 http://ports.ubuntu.com/ubuntu-ports/ trusty/main armhf Packages
ubuntu@ms10-18n3-slayton:~$

ubuntu@ms10-18n3-slayton:~$ ps -ef | grep irqbalance
ubuntu 1769 1390 0 18:13 pts/0 00:00:00 grep --color=auto irqbalance
ubuntu@ms10-18n3-slayton:~$

ubuntu@ms10-18n3-slayton:~$ sudo service irqbalance start
irqbalance stop/waiting
ubuntu@ms10-18n3-slayton:~$

ubuntu@ms10-18n3-slayton:~$ tail -n 1 /var/log/syslog
Jul 1 18:14:54 ms10-18n3-slayton /usr/sbin/irqbalance: Balancing causes degradation of network performance on HP ProLiant m800 Server Cartridge. Disabling irqbalance.
ubuntu@ms10-18n3-slayton:~$

tags: added: verification-done-trusty
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package irqbalance - 1.0.6-2ubuntu0.14.04.2

---------------
irqbalance (1.0.6-2ubuntu0.14.04.2) trusty; urgency=medium

  * Disable irqbalance on HP M800 server cartridge (LP: #1452875). Thanks to
    Manoj Iyer.

 -- Robie Basak <email address hidden> Thu, 18 Jun 2015 14:48:21 +0000

Changed in irqbalance (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Chris J Arges (arges) wrote : Update Released

The verification of the Stable Release Update for irqbalance has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.