et131x causing ksoftirqd to eat up cpu

Bug #150515 reported by Thomas Liebetraut on 2007-10-08
26
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Nominated for Gutsy by Thomas Liebetraut
linux-ubuntu-modules-2.6.22 (Ubuntu)
Undecided
Unassigned
Nominated for Gutsy by Thomas Liebetraut
linux-ubuntu-modules-2.6.24 (Ubuntu)
Medium
Tim Gardner
Nominated for Gutsy by Thomas Liebetraut

Bug Description

I have a LG T1 Express Dual laptop with an Agere et131x ethernet adapter. In Feisty, the driver (compiled from their SF.net project page, version 1.2.3) worked fine without any major problems.
Now, in Gutsy, after upgrading to linux-ubuntu-modules version 2.6.22-13.33, the driver is included in the Ubuntu distribution. When the et131x module is loaded, I experience the process ksoftirqd to be busy quite often, producing heat and draining my battery faster. The solution is to unload the module if I don't need wired ethernet, but of course it would be cool to have ethernet and not haing 25-20% of your cpu eaten up by the kernel.
The exact same problem exists with a self-compiled kernel module with the patch found at http://sourceforge.net/tracker/index.php?func=detail&aid=1709009&group_id=179406&atid=889025 .
Unfortunately, I don't know how to debug this problem or narrow it down. I suspect the driver not being compatible with the new kernel's tickless interface, but obviously I'm not sure about this.

Thomas Liebetraut (tommie-lie) wrote :
Soni (daniel-r) wrote :

I can confirm this bug - I have also a LG T1 and after upgrading to Gutsy the process ksoftirqd coused permamantly 15-30% cpu load. After removing the module et131x there is no more cpu load of ksoftirqd.

Siva (sivasoft) wrote :
Download full text (19.5 KiB)

Folks,

I had the same issue with my laptop after I installed the restricted drivers on my Gutsy "Firmware fro Broadcom 43xx chipset family" in an attempt to set up my wireless network.

I did try to follow all the instructions from the forum using ndiswrapper with 64bit Windows Vista driver but still not able to make it. However this also enabled me to find a bug ksoftirqd/1 eating up 100% cpu.

The moment I uninstall the above said driver, my machine started acting normally. Here is my machine configuration:

Compaq Presario V3414AU
AMD Turion(tm) 64 X2
1GB SD RAM
Nvidia family of Video card
Broadcomm wireless
Altec Lansing sound card/speaker

I'm still not able to make my wireless working on the Ubuntu 7.10, Though my sound, Video, Avant, Emerald, Compiz are working smoothly with Wired network interface.

I appreciate if any one here have similar issue and solution for the same.

Prior to uninstalling the firmware, I have seen these logs in my dmesg:
[ 686.251840] printk: 319404 messages suppressed.
[ 686.251847] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 691.249198] printk: 317752 messages suppressed.
[ 691.249204] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 696.246554] printk: 316879 messages suppressed.
[ 696.246560] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 701.243901] printk: 318933 messages suppressed.
[ 701.243907] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 706.241256] printk: 312110 messages suppressed.
[ 706.241265] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 711.238621] printk: 311656 messages suppressed.
[ 711.238630] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR

And the out put of lsof looked like this:

siva@siva-laptop:~$ lsof|more
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
init 1 root cwd unknown /proc/1/cwd (readlink: Permission denied)
init 1 root rtd unknown /proc/1/root (readlink: Permission denied)
init 1 root txt unknown /proc/1/exe (readlink: Permission denied)
init 1 root NOFD /proc/1/fd (opendir: Permission denied)
kthreadd 2 root cwd unknown /proc/2/cwd (readlink: Permission denied)
kthreadd 2 root rtd unknown /proc/2/root (readlink: Permission denied)
kthreadd 2 root txt unknown /proc/2/exe (readlink: Permission denied)
kthreadd 2 root NOFD /proc/2/fd (opendir: Permission denied)
migration 3 root cwd unknown /proc/3/cwd (readlink: Permission denied)
migration 3 root rtd unknown /proc/3/root (readlink: Permission denied)
migration 3 root txt unknown /proc/3/exe (readlink: Permission denied)
migration 3 root NOFD /proc/3/fd (opendir...

Chan (cgjeong-gmail) wrote :

I had the same problem with my LG P1 laptop after upgrading to Gutsy.
Reverting the kernel 2.6.22 to 2.6.20-16 of Feisty works well and there is no problem any more.
Unloading the et131x module on 2.6.22 works fine as well but it's very annoying for me.

joyrider (joyrider3774) wrote :

I'm having a similar problem with Ksoftirqd but with other modules perhaps it's related to this problem/bug. The modules in question are CX88_ALSA and CX8800 although i believe it's the later one. I have a constant 20-30% cpu usage from ksoftirqd/0 process only when unloading the CX88_ALSA and CX8800 module the load goes away. But this prevents me from using the tv card in ubuntu.

excerpt from top :
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4 root 34 19 0 0 0 S 24 0.0 57:27.95 ksoftirqd/0

Bernhard Gehl (bernhard-gehl) wrote :

I am not quite sure if this bug is a duplicate - it seems to me that something is wrong with this specific driver or its interaction with the (tickless?) kernel:

My setup is similar to the one, the bug was originally reported for: LG S1 Pro Express Dual - with an Agere et131x wired network card. As reported, ksoftirqd/1 keeps running at 15-25 % of CPU power on one of the CPUs (currently #1). Switching off the network via Network Manager lets the IRQ load vanish instantly, as does removing the et131x module by hand.

Running the 'powertop' tool from Intel, the top of the list of processes waking up the processors is this:
    31,9% (250,1) NetworkManager : et131x_open (et131x_error_timer_handler)

Furthermore, when I call 'dmesg' I get a seemingly endless list of error messages in the style of:
    [ 844.008000] et131x.ko:WARNING:et131x_ioctl Unhandled IOCTL Code: 0x8b01

I had noticed the dmesg errors earlier and searched a little (wired network is working fine so my motivation was limited) and I came across some other threads (most about Ubuntu Gutsy), where users got this error message. In the sourceforge forum (Google doesn't search these!), someone said, that this error message was due to chatter over the wireless network that the et131x doesn't recognize and logs as this error code.
(Thread here: http://sourceforge.net/forum/forum.php?thread_id=1739227&forum_id=621136 )

Could there be a connection (pun not intended)?

Of course I'll gladly run any diagnostics/tests necessary to troubleshoot the problem!

Regards,
 Bernhard

Henrik Nilsen Omma (henrik) wrote :

This will be retargeted towards the Hardy kernel once it is released. I've tagged this as "hardy-kernel-candidate" so that we make sure to retarget this report once the new release is out. However against the linux-source-2.6.22 package this is being marked as "Won't Fix" as it does not meet the criteria for a stable release update. To learn more about the stable release update process please refer to https://wiki.ubuntu.com/StableReleaseUpdates . Thanks!

Changed in linux-ubuntu-modules-2.6.22:
status: New → Won't Fix

Hardy Heron Alpha2 was recently released. It contains an updated version of the kernel. You can download and try the new Hardy Heron Alpha2 release from http://cdimage.ubuntu.com/releases/hardy/alpha-2/ . You should be able to then test the new kernel via the LiveCD. If you can, please verify if this bug still exists or not and report back your results. General information regarding the release can also be found here: http://www.ubuntu.com/testing/hardy/alpha2 . Thanks!

Changed in linux-ubuntu-modules-2.6.24:
status: New → Incomplete
Thomas Liebetraut (tommie-lie) wrote :

I just checked the Hardy Heron live CD with kernel 2.6.24-2 and the problem with the et131x module and ksoftirqd still exists. Also, there still is no news from upstream.

morgenbart (morgenbart) wrote :

The problem seems to be indeed that the et131x driver doesn't work well with a tickless kernel. I boot with "nohz=off" now and ksoftirqd's CPU consumption is back to normal. To turn off the the messages to the kernel log about the unhandled IOCTL I load the module with the option "et131x_debug_flags=0".

LeDechaine (ledechaine) wrote :

I just experienced the exact same bug that joyrider had (see higher or click the link).
https://bugs.launchpad.net/ubuntu/+source/linux-ubuntu-modules-2.6.24/+bug/150515/comments/6

My ksoftirqd/0 was using ~20% of my CPU. Found this page via google, unloaded the cx88 modules (that i'm using for my TV Card), and ksoftirqd is now back to normal, using.. ~0.0% CPU. ;)

I'm under Ubuntu 7.10
Pentium 3 866
TV Card: Leadtek Winfast TV2000XP Expert

Haven't touched the grub config since my ubuntu install so the "nohz" option is at the default, whatever the default is.

LeDechaine (ledechaine) wrote :

Sorry, forgot to mention: Kernel 2.6.22-14-generic

MatB (matteo-brusa) wrote :

Same problem here, the issue is triggered by a high CPU load. ksoftirqd eats around 27% of CPU time.
Running hardy kernel linux-image-2.6.24-5-server.
powertop shows "et131x_open (et131x_error_timer_handler)" on top of the list.
Turning down the network interface brings the load back to normal; I tried to rmmod and modprobe et131x but as soon as the network is up the load raises up again.
Unfortunately i can't use the kernel boot param "nohz=off" since it deteriorates the quality of video playback.

Ilya Krets (nl4m) wrote :

Probably it can be fixed by changing
    add_timer( &pAdapter->ErrorTimer );
to
    mod_timer( &pAdapter->ErrorTimer, jiffies+ 5*HZ );
in error_timer_handler function in file et131x_initpci.c ( learned from http://sourceforge.net/forum/forum.php?thread_id=1876109&forum_id=621136 ).

Worked for me :)

Per the kernel team's bug policy, can you please attach the following information for the most recent Hardy Alpha kernel (2.6.24-8 as of this post). Please be sure to attach each file as a separate attachment.

* uname -a > uname-a.log
* cat /proc/version_signature > version.log
* dmesg > dmesg.log
* sudo lspci -vvnn > lspci-vvnn.log

For more information regarding the kernel team bug policy, please refer to https://wiki.ubuntu.com/KernelTeamBugPolicies . Thanks again and we appreciate your help and feedback.

Tim Gardner (timg-tpi) wrote :

Implemented the mod_timer() patch.

Changed in linux-ubuntu-modules-2.6.24:
assignee: nobody → timg-tpi
importance: Undecided → Medium
milestone: none → hardy-alpha-6
status: Incomplete → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-ubuntu-modules-2.6.24 - 2.6.24-10.14

---------------
linux-ubuntu-modules-2.6.24 (2.6.24-10.14) hardy; urgency=low

  [Jay Chetty]

  * poulsbo: update to beta 6 version of the driver
  * poulsbo: update to beta 6 version of the driver

  [Stefan Bader]

  * Pull in the RAID4/5 target for device-mapper.
  * Pull in the RAID4/5 target for device-mapper.
  * Remove exports of dm-log to solve duplicate symbols.
  * Update hdaps_ec to use vendor IBM for X40 and X41.
    - LP: #33950

  [Tim Gardner]

  * Adjust quirk for snd-hda-intel on Dell laptops
    - LP: #191859
  * Added acer-wmi
    - LP: #190677
  * Enabled iwlwifi LEDS
    - LP: #176090
  * iwlwifi spams syslog
    - LP: #191388
  * Updated to et131x-1.2.3-1
    - LP: #150515
  * et131x: Slow down error timer
    - LP: #150515

 -- Tim Gardner <email address hidden> Sun, 17 Feb 2008 21:52:02 -0700

Changed in linux-ubuntu-modules-2.6.24:
status: Fix Committed → Fix Released

I patched a few files and put up a new release of the driver on sourceforge. I replaced add_timer with mod_timer and the cpu no longer seems to be getting eaten up. I commented out the logging of unhandled IOCTL messages as it was polluting message buffer.

Ilya Krets (nl4m) wrote :

that's good, thanks :)

FoxCtrl (foxctrl) wrote :

I got the same problems with softirqd using about 25% of my CPU load, and it was only when the et131x module was loaded. So I downloaded the et131x...tar.gz from Richard (above) compiled and installed it.

But "make modules_install" copyied new et131x.ko to the wrong dir /modules/version/extra/et131x.ko. So I moved it the right place

/modules/version/ubuntu/net/et131x/et131x.ko

and made another depmod -a -q and a reboot of my system and then the cpu load was down.

Great & Thank you !

But the file size of the new module is about 10 times bigger than the old one. On my system:
-rw-r--r-- 1 root root 1866614 2008-03-05 12:41 et131x.ko // new driver
-rw-r--r-- 1 root root 116692 2007-10-13 07:43 et131x.ko.save // old version

Is that ok ? I'm worried about the new module to be more than 10 times bigger than the old one ...

W Unruh (unruh) wrote :

The problem I think was that the adapter->ErrorTimer.expires variable ( which says when the timer is supposed to expire) was set only once. Thus after the first return, the expires time was in the past, and the timer returns immediately which resets the timer which returns immediately, etc. This should have always caused trouble, not just on the tickless kernel. I suspect that originally "immediately" meant once per jiffie, while on the tickless kernel it means "as fast as possible". Ie it originally churned at 300 times per second and now was going at 10000 times per second.

Note that originally it was set ( in et131x_initpci.c) as
adapter->ErrorTimer.expires = jiffies + TX_ERROR_PERIOD * HZ / 1000;
which is supposed to be once a second since TX_ERROR_PERIOD is 1000. The above fix does it once every 30 sec. (there are HZ jiffies /sec) I have no idea if this makes any difference.

This fix has not made it to sourceforge. The latest sourceforge version is 1.2.3-3 which contains a lot of the fixes needed to run and compile this driver on later kernels, but it does not contain the add_timer fix.

psyke777 (spam-psyke) wrote :

On my LG X100 using Intrepid, ksoftirqd uses about 10-15% CPU when et131x is loaded. Doing the change mentioned here: http://sourceforge.net/tracker/index.php?func=detail&aid=2045610&group_id=179406&atid=889023 where

pAdapter->ErrorTimer.expires=jiffies+HZ;

is added before:

add_timer( &pAdapter->ErrorTimer );

solved the problem.

Alternatively, using nohz=off kernel parameter to make the kernel no longer tickless solves it too.

Zeus (jason-engelsman) wrote :

Thanks Psyke777.

Quick question where exactly does one make these changes.

Sorry I'm not quite an ubuntu newbie but kernal settings is something pretty new to me...

Thanks

On 9.04 Alpha6 this problem seems to be back (10-20 % load from ksoftirqd/x with et131x loaded). Is this a regression or did I just fail to notice something?

This is definitely some strage regression: compiling and installing the module from the "et131x-source" package solves it.

Please update the code for building the kernel images to that from the source package...

Hi tommie-lie,

Please be sure to confirm this issue exists with the latest development release of Ubuntu. ISO CD images are available from http://cdimage.ubuntu.com/releases/ . Please then run following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux-image-`uname -r` 150515

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
dino99 (9d9) wrote :

closing that old report, as it has not got recent comment.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.