ksoftirqd/1 using 50% CPU or above

Bug #183461 reported by bobslaede
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
openSUSE
Unknown
Critical
linux (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

ksoftirqd (ksoftirqd/1) is using 50% CPU or above when idle.
As others have already found out this is a bug with a module for a tv-tuner card. But I don't have such a card, and no such module is loaded.

System info: Ubuntu 7.10

$ uname -a
Linux bob-laptop 2.6.22-14-generic #1 SMP Tue Dec 18 08:02:57 UTC 2007 i686 GNU/Linux

----------------------

$ top
top - 10:39:01 up 24 min, 4 users, load average: 1.02, 1.57, 1.25
Tasks: 129 total, 3 running, 126 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3%us, 0.0%sy, 0.0%ni, 46.8%id, 0.0%wa, 25.8%hi, 27.1%si, 0.0%st
Mem: 2075360k total, 1055600k used, 1019760k free, 24936k buffers
Swap: 2000084k total, 0k used, 2000084k free, 465784k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    7 root 35 19 0 0 0 R 80 0.0 15:07.14 ksoftirqd/1
 6013 root 15 0 873m 46m 8040 S 2 2.3 2:25.68 Xorg
 6514 bob 16 0 44912 11m 9608 S 1 0.6 0:02.33 nm-applet
 5448 mysql 15 0 124m 16m 4468 S 1 0.8 0:02.42 mysqld
 6481 bob 15 0 87180 41m 17m S 0 2.1 0:26.95 compiz.real
 6574 bob 15 0 242m 81m 23m R 0 4.0 3:04.18 firefox-bin
    1 root 18 0 2948 1852 532 S 0 0.1 0:01.29 init
    2 root 10 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
    3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0
    4 root 34 19 0 0 0 S 0 0.0 0:00.61 ksoftirqd/0

----------------------

$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 104
model name : AMD Turion(tm) 64 X2 Mobile Technology TL-58
stepping : 1
cpu MHz : 1900.000
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8legacy 3dnowprefetch ts fid vid ttp tm stc 100mhzsteps
bogomips : 3821.15
clflush size : 64

processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 104
model name : AMD Turion(tm) 64 X2 Mobile Technology TL-58
stepping : 1
cpu MHz : 1900.000
cache size : 512 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8legacy 3dnowprefetch ts fid vid ttp tm stc 100mhzsteps
bogomips : 3821.15
clflush size : 64

----------------------

$ cat /proc/interrupts
           CPU0 CPU1
  0: 344633 56518 IO-APIC-edge timer
  1: 1 11 IO-APIC-edge i8042
  5: 0 2 IO-APIC-fasteoi ohci1394
  7: 1 0 IO-APIC-fasteoi sdhci:slot0
  8: 0 19 IO-APIC-edge rtc
  9: 546 327 IO-APIC-fasteoi acpi
 12: 3 120 IO-APIC-edge i8042
 14: 12529 1786 IO-APIC-edge ide0
 16: 0 33 IO-APIC-fasteoi ehci_hcd:usb1
 17: 9483 10481 IO-APIC-fasteoi ahci
 18: 139998 23392 IO-APIC-fasteoi eth0
 19: 279499 49594 IO-APIC-fasteoi ohci_hcd:usb2
 20: 163559 25038 IO-APIC-fasteoi nvidia
 21: 113556 91832347 IO-APIC-fasteoi bcm43xx
 22: 34 431 IO-APIC-fasteoi HDA Intel
NMI: 0 0
LOC: 56397 344037
ERR: 1
MIS: 0

----------------------

$ lsmod
Module Size Used by
ipv6 273892 23
af_packet 24840 2
rfcomm 42136 2
l2cap 26240 11 rfcomm
ppdev 10244 0
powernow_k8 16960 1
cpufreq_conservative 8072 0
cpufreq_stats 7232 0
cpufreq_userspace 5280 0
cpufreq_ondemand 9612 0
freq_table 5792 3 powernow_k8,cpufreq_stats,cpufreq_ondemand
cpufreq_powersave 2688 0
sbs 19592 0
button 8976 0
video 18060 11
dock 10656 0
ac 6148 0
battery 11012 0
container 5504 0
sbp2 24072 0
parport_pc 37412 0
lp 12580 0
parport 37448 3 ppdev,parport_pc,lp
snd_hda_intel 263712 1
snd_pcm_oss 44672 0
snd_mixer_oss 17664 1 snd_pcm_oss
snd_pcm 80388 2 snd_hda_intel,snd_pcm_oss
snd_seq_dummy 4740 0
snd_seq_oss 33152 0
snd_seq_midi 9600 0
joydev 11328 0
snd_rawmidi 25728 1 snd_seq_midi
snd_seq_midi_event 8448 2 snd_seq_oss,snd_seq_midi
uvcvideo 48644 0
snd_seq 53232 6 snd_seq_dummy,snd_seq_oss,snd_seq_midi,snd_seq_midi_event
snd_timer 24324 2 snd_pcm,snd_seq
snd_seq_device 9228 5 snd_seq_dummy,snd_seq_oss,snd_seq_midi,snd_rawmidi,snd_seq
bcm43xx 127336 0
snd 54660 11 snd_hda_intel,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_seq_oss,snd_rawmidi,snd_seq,snd_timer,snd_seq_device
soundcore 8800 1 snd
compat_ioctl32 2304 1 uvcvideo
ieee80211softmac 31360 1 bcm43xx
usbhid 29536 0
ide_cd 32672 0
cdrom 37536 1 ide_cd
hci_usb 18332 2
videodev 29312 1 uvcvideo
v4l1_compat 15364 2 uvcvideo,videodev
v4l2_common 18432 2 uvcvideo,videodev
ieee80211 35656 2 bcm43xx,ieee80211softmac
ieee80211_crypt 7040 1 ieee80211
bluetooth 57060 7 rfcomm,l2cap,hci_usb
hid 28928 1 usbhid
snd_page_alloc 11400 2 snd_hda_intel,snd_pcm
nvidia 7859584 38
i2c_nforce2 7040 0
psmouse 39952 0
agpgart 35016 1 nvidia
serio_raw 8068 0
k8temp 6656 0
sdhci 18828 0
mmc_core 28420 1 sdhci
pcspkr 4224 0
i2c_core 26112 2 nvidia,i2c_nforce2
shpchp 34580 0
pci_hotplug 32704 1 shpchp
evdev 11136 7
ext3 133896 2
jbd 60456 1 ext3
mbcache 9732 1 ext3
sg 36764 0
sd_mod 30336 6
ata_generic 8452 0
ohci1394 36528 0
ieee1394 96312 2 sbp2,ohci1394
amd74xx 15260 0 [permanent]
ide_core 116804 2 ide_cd,amd74xx
forcedeth 51592 0
ohci_hcd 22916 0
ahci 23300 4
libata 125168 2 ata_generic,ahci
scsi_mod 147084 4 sbp2,sg,sd_mod,libata
ehci_hcd 36492 0
usbcore 138632 6 uvcvideo,usbhid,hci_usb,ohci_hcd,ehci_hcd
thermal 14344 0
processor 32072 2 powernow_k8,thermal
fan 5764 0
fuse 47124 3
apparmor 40728 0
commoncap 8320 1 apparmor

----------------------

$ lspci
00:00.0 RAM memory: nVidia Corporation MCP65 Memory Controller (rev a3)
00:01.0 ISA bridge: nVidia Corporation MCP65 LPC Bridge (rev a3)
00:01.1 SMBus: nVidia Corporation MCP65 SMBus (rev a1)
00:01.3 Co-processor: nVidia Corporation MCP65 SMU (rev a1)
00:02.0 USB Controller: nVidia Corporation MCP65 USB Controller (rev a3)
00:02.1 USB Controller: nVidia Corporation MCP65 USB Controller (rev a3)
00:06.0 Ethernet controller: nVidia Corporation MCP65 Ethernet (rev a3)
00:07.0 Audio device: nVidia Corporation MCP65 High Definition Audio (rev a1)
00:08.0 PCI bridge: nVidia Corporation MCP65 PCI bridge (rev a1)
00:09.0 IDE interface: nVidia Corporation MCP65 IDE (rev a1)
00:0a.0 IDE interface: nVidia Corporation MCP65 SATA Controller (rev a3)
00:0b.0 PCI bridge: nVidia Corporation Unknown device 045b (rev a1)
00:0c.0 PCI bridge: nVidia Corporation MCP65 PCI Express bridge (rev a1)
00:0d.0 PCI bridge: nVidia Corporation MCP65 PCI Express bridge (rev a1)
00:0e.0 PCI bridge: nVidia Corporation MCP65 PCI Express bridge (rev a1)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
03:00.0 Network controller: Broadcom Corporation BCM4312 802.11a/b/g (rev 02)
05:00.0 VGA compatible controller: nVidia Corporation GeForce 8400M GS (rev a1)
07:05.0 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 05)
07:05.1 Generic system peripheral [0805]: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22)
07:05.2 System peripheral: Ricoh Co Ltd R5C843 MMC Host Controller (rev 12)
07:05.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12)
07:05.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 12)

----------------------

Revision history for this message
bobslaede (jeppe-dyrby) wrote :

Doing a 'dmesg' gives a lot of buggy info:

[ 1731.868000] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 1736.868000] printk: 299004 messages suppressed.
[ 1736.868000] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 1741.868000] printk: 298726 messages suppressed.
[ 1741.868000] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 1746.868000] printk: 289806 messages suppressed.
[ 1746.868000] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 1751.868000] printk: 287231 messages suppressed.
[ 1751.868000] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 1756.868000] printk: 288652 messages suppressed.
[ 1756.868000] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 1761.868000] printk: 298387 messages suppressed.
[ 1761.868000] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 1766.868000] printk: 284399 messages suppressed.
[ 1766.868000] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 1771.868000] printk: 281344 messages suppressed.
[ 1771.868000] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 1776.868000] printk: 296089 messages suppressed.
[ 1776.868000] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 1781.868000] printk: 298391 messages suppressed.
[ 1781.868000] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 1786.868000] printk: 300005 messages suppressed.
[ 1786.868000] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR
[ 1791.868000] printk: 272297 messages suppressed.
[ 1791.868000] bcm43xx: FATAL ERROR: BCM43xx_IRQ_XMIT_ERROR

--------

Unloading bcm43xx (my wireless card) ksoftirqd/1 jumps to 100% CPU load.

I can't imagine this is good

Revision history for this message
bobslaede (jeppe-dyrby) wrote :

About 2 minutes after removing the module, (the wifi card is turned off on my laptop), the laptop freezes, and the capslock light on the laptop blinks rapidly.
Its not possible to send keystrokes to the kernel or anything, like ctrl+alt+prntscreen+REISUB.

Revision history for this message
bobslaede (jeppe-dyrby) wrote :

Apparently, when I'm not using a dual screen setup, none of this happens...

I'm not sure how to clarify the problem, or how to make it reproducible for others

Revision history for this message
David de Beer (daviddebeer) wrote :
Download full text (11.6 KiB)

I can confirm this bug, it is also present on my machine:

$ uname -a
Linux unix 2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC 2008 i686 GNU/Linux

$ top

top - 21:01:54 up 1 day, 1:15, 5 users, load average: 0.74, 0.78, 0.71
Tasks: 173 total, 2 running, 171 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.2%us, 3.5%sy, 0.1%ni, 94.0%id, 0.2%wa, 0.3%hi, 0.7%si, 0.0%st
Mem: 2075944k total, 2022020k used, 53924k free, 9652k buffers
Swap: 3911788k total, 360k used, 3911428k free, 1621568k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    7 root 34 19 0 0 0 S 10 0.0 110:01.04 ksoftirqd/1
 8752 david 15 0 172m 55m 30m S 2 2.7 2:01.76 amarokapp
20844 david 15 0 2488 1100 792 R 2 0.1 0:00.01 top
    1 root 18 0 2952 1852 532 S 0 0.1 0:01.31 init
    2 root 12 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
    3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0
    4 root 34 19 0 0 0 S 0 0.0 0:01.88 ksoftirqd/0
    5 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0
    6 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/1
    8 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/1
    9 root 10 -5 0 0 0 S 0 0.0 0:00.02 events/0
   10 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/1
   11 root 20 -5 0 0 0 S 0 0.0 0:00.00 khelper
   31 root 10 -5 0 0 0 S 0 0.0 0:00.02 kblockd/0
   32 root 11 -5 0 0 0 S 0 0.0 0:00.00 kblockd/1

$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 35
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
stepping : 2
cpu MHz : 2310.556
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy ts fid vid ttp
bogomips : 4623.99
clflush size : 64

processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 35
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
stepping : 2
cpu MHz : 2310.556
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt l...

Revision history for this message
bobslaede (jeppe-dyrby) wrote :

@david
All the stuff i said before was kinda fumbling in the dark.
The real problem is with the bcm43xx module, for your wireless card.

Everything runs fine here, after i put the module in /etc/modprobe.d/blacklist.

Revision history for this message
David de Beer (daviddebeer) wrote :

I can now confirm that this bug is caused by the TV Card. Once I blacklisted the cx8800 and cx88xx modules, the problem went away:

$ cat /etc/modprobe.d/blacklist | tail -2
blacklist cx8800
blacklist cx88xx

$ uname -a
Linux unix 2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC 2008 i686 GNU/Linux

$ top

top - 10:07:00 up 1:05, 3 users, load average: 0.00, 0.00, 0.00
Tasks: 151 total, 1 running, 150 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.1%sy, 0.0%ni, 99.1%id, 0.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2075944k total, 682880k used, 1393064k free, 16736k buffers
Swap: 3911788k total, 0k used, 3911788k free, 446616k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    1 root 18 0 2948 1856 532 S 0 0.1 0:01.31 init
    2 root 12 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
    3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0
    4 root 34 19 0 0 0 S 0 0.0 0:00.13 ksoftirqd/0
    5 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0
    6 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/1
    7 root 39 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1
    8 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/1
    9 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/0
   10 root 10 -5 0 0 0 S 0 0.0 0:00.00 events/1
   11 root 20 -5 0 0 0 S 0 0.0 0:00.00 khelper
   31 root 10 -5 0 0 0 S 0 0.0 0:00.00 kblockd/0
   32 root 10 -5 0 0 0 S 0 0.0 0:00.00 kblockd/1

Revision history for this message
David de Beer (daviddebeer) wrote : Re: [Bug 183461] Re: ksoftirqd/1 using 50% CPU or above

Hey Bob, that's strange.. perhaps there is more than one bug. My machine now
runs fine after I've blacklisted the TV card modules.

Strange!

David.

On Fri, Mar 7, 2008 at 10:10 AM, bobslaede <email address hidden> wrote:

> @david
> All the stuff i said before was kinda fumbling in the dark.
> The real problem is with the bcm43xx module, for your wireless card.
>
> Everything runs fine here, after i put the module in
> /etc/modprobe.d/blacklist.
>
> --
> ksoftirqd/1 using 50% CPU or above
> https://bugs.launchpad.net/bugs/183461
> You received this bug notification because you are a direct subscriber
> of the bug.
>

--
recovery is forever...
..... if you can take it that long.

Revision history for this message
Gareth Fitzworthington (mapping-gp-deactivatedaccount) wrote :

Can those experiencing this issue try the following kernel boot option (& un-blacklist any previously blacklisted drivers):
nohz=off

This disables the 'tickless' aspect of the tickless kernel (enabled by default in Gutsy & perhaps other releases). ksoftirqd appears to be getting bombarded by by requests (probably from not quite fully polished hardware drivers). This may solve the issue.

Revision history for this message
A. Villaveces (avillavecesn) wrote :

Gareth: thank you very much for posting the

nohz=off

kernel boot option. It really worked for me - I was desperate with ksofirqd/1 taking up lots of resources and heating the CPU.

(I am on Linux Mint 4 - Daryna - at this point)

Andrés Villaveces

Revision history for this message
James Collier (james-collier412) wrote :

We are closing this bug report because it lacks the information we need to investigate the problem, as described in the previous comments. Please reopen it if you can give us the missing information, and don't hesitate to submit bug reports in the future. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to "New". Thanks again!

Revision history for this message
In , Vmicho (vmicho) wrote :
Download full text (3.7 KiB)

After a while of using my notebook the ksoftirqd begins to use all my remaining cpu time.
The only way to return to a normal state is by reboot. But also the reboot fails (it hangs somewhere, I cannot see the console. I'm using nvidia's video driver and have black console as soon as X starts - I can eventually set the nv or vesa driver in xorg.conf and then see).
The worst part is it hangs during shutdown before unmounting HDDs. alt+ctrl+del nor sysrq does not work so I need to hard boot (power off). (Therefore I set this bug as critical, otherwise it can be major)

I have no idea what causes it. Maybe network? Time in /var/log/messages shows only dhcpd doing its stuff + knetworkmanager somewhat appears on the top list.
It hangs on both wlan or lan

I searched for a while. Some guy have similar problem with a Clevo notebook with very similar specs.
I had no such problem with previous suse (11.0)

Any ideas?
My ideas for now can be to try these :
- use for a while only the vesa/nv driver
- install vanilla kernel
- download & install
- disable dhcp daemon (easiest I think)

Here is top of the top processes:

top - 22:35:21 up 1 day, 4:05, 7 users, load average: 1.66, 1.76, 1.52
Tasks: 142 total, 3 running, 139 sleeping, 0 stopped, 0 zombie
Cpu0 : 2.0%us, 6.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 92.0%si, 0.0%st
Cpu1 : 4.0%us, 0.0%sy, 0.3%ni, 95.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4088564k total, 3940616k used, 147948k free, 66132k buffers
Swap: 4409800k total, 28k used, 4409772k free, 3204660k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    4 root 15 -5 0 0 0 R 99 0.0 24:24.67 ksoftirqd/0
 3049 root 20 0 360m 69m 9772 S 4 1.7 31:08.22 X
 9720 micho 39 19 128m 47m 14m S 4 1.2 14:14.83 operapluginwrap
 9628 micho 20 0 511m 382m 17m S 2 9.6 12:52.75 opera
10710 micho 20 0 156m 56m 29m S 2 1.4 11:58.14 amarokapp
 3853 micho 20 0 35884 13m 8840 S 1 0.3 2:55.25 knetworkmanager
 8838 root 15 -5 0 0 0 S 1 0.0 0:00.96 events/1
 3984 micho 20 0 37716 16m 10m R 0 0.4 1:51.73 konsole
11588 root 20 0 99112 57m 24m S 0 1.4 0:51.00 y2base
    1 root 20 0 1008 356 308 S 0 0.0 0:01.20 init
    2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
    3 root RT -5 0 0 0 S 0 0.0 0:00.14 migration/0
    7 root 15 -5 0 0 0 S 0 0.0 0:05.14 events/0

/var/log/messages around the fatal time (I think it occured somewhere after the "MARK"):
Jan 9 21:46:30 linux-6vsc dhclient: DHCPREQUEST on eth0 to 192.168.1.1 port 67
Jan 9 21:46:31 linux-6vsc dhclient: DHCPACK from 192.168.1.1
Jan 9 21:46:31 linux-6vsc dhclient: bound to 192.168.1.3 -- renewal in 1625 seconds.
Jan 9 22:06:31 linux-6vsc -- MARK --
Jan 9 22:13:36 linux-6vsc dhclient: DHCPREQUEST on eth0 to 192.168.1.1 port 67
Jan 9 22:13:36 linux-6vsc dhclient: DHCPACK from 192.168.1.1
Jan 9 22:13:36 linux-6vsc dhclient: bound to 192.168.1.3 -- renewal in 1726 seconds.

My system:

Linux linux-6vsc 2.6.27.7-9-pae #1 SMP 2008...

Read more...

Revision history for this message
In , Meissner-novell (meissner-novell) wrote :

can you get output of:
cat /proc/interrupts

to see if an interrupt is triggered very often

Revision history for this message
In , Vmicho (vmicho) wrote :

Hi here it is (I'm running nearly the entire day and ksoftfirqd is normal for now).
I'll provide another one (maybe also a graph) when the ksoftirqd problem occurs (I supopose that is really needed).

> cat /proc/interrupts
           CPU0 CPU1
  0: 26863620 0 IO-APIC-edge timer
  1: 254 0 IO-APIC-edge i8042
  6: 0 0 IO-APIC-edge lirc_ite8709
  8: 1 0 IO-APIC-edge rtc0
  9: 8972 0 IO-APIC-fasteoi acpi
 12: 564 0 IO-APIC-edge i8042
 16: 224750 0 IO-APIC-fasteoi uhci_hcd:usb2, nvidia
 18: 1191186 0 IO-APIC-fasteoi uhci_hcd:usb8, jmb38x_ms:slot0, ohci1394, mmc0
 19: 1121921 0 IO-APIC-fasteoi ata_piix, ata_piix, ehci_hcd:usb1, uhci_hcd:usb4, uhci_hcd:usb7
 21: 0 0 IO-APIC-fasteoi uhci_hcd:usb3
 22: 3042821 0 IO-APIC-fasteoi HDA Intel
 23: 2 0 IO-APIC-fasteoi ehci_hcd:usb5, uhci_hcd:usb6
216: 2369317 0 PCI-MSI-edge iwl3945
217: 6094614 0 PCI-MSI-edge eth0
NMI: 0 0 Non-maskable interrupts
LOC: 4722382 16643134 Local timer interrupts
RES: 1373132 3216824 Rescheduling interrupts
CAL: 1511391 2413558 function call interrupts
TLB: 32209 34157 TLB shootdowns
TRM: 0 0 Thermal event interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0

Revision history for this message
In , Vmicho (vmicho) wrote :

Hi again.
So I have here the latest interrupts before I rebooted.

   0: 37799577 0 IO-APIC-edge timer
   1: 309 0 IO-APIC-edge i8042
   6: 47784 0 IO-APIC-edge lirc_ite8709
   8: 1 0 IO-APIC-edge rtc0
   9: 11984 0 IO-APIC-fasteoi acpi
  12: 1074 0 IO-APIC-edge i8042
  16: 276919 0 IO-APIC-fasteoi uhci_hcd:usb2, nvidia
  18: 1315728 0 IO-APIC-fasteoi uhci_hcd:usb8, jmb38x_ms:slot0, ohci1394, mmc0
  19: 1579638 0 IO-APIC-fasteoi ata_piix, ata_piix, ehci_hcd:usb1, uhci_hcd:usb4, uhci_hcd:usb7
  21: 0 0 IO-APIC-fasteoi uhci_hcd:usb3
  22: 3305783 0 IO-APIC-fasteoi HDA Intel
  23: 2 0 IO-APIC-fasteoi ehci_hcd:usb5, uhci_hcd:usb6
 216: 3033526 0 PCI-MSI-edge iwl3945
 217: 9101163 0 PCI-MSI-edge eth0
 NMI: 0 0 Non-maskable interrupts
 LOC: 6434802 22944923 Local timer interrupts
 RES: 1790955 4134157 Rescheduling interrupts
 CAL: 1564485 2469460 function call interrupts
 TLB: 43600 45062 TLB shootdowns
 TRM: 0 0 Thermal event interrupts
 SPU: 0 0 Spurious interrupts
 ERR: 0
 MIS: 0

I'm also joining the chart from collected /proc/interrupts each minute during ~870 minutes (with a simple script) created with OOo.
basically I see nothing very unusual on it.

Just to note: 870th minute is around 18:30. Resume from s2ram is at 375th minute (9:15 morning). I closed all browsers and went out for skiing at around 12:30 (I left only Azureus open). So the ksoftirqd problem appeared when I was out.
There is little change in slope at minute 450 (11:30), except the "eth0" curve (3rd from top). In var/log/messages there is nothing unusual, only classic dhcprequest stuff (same as in previous post). I also realized now that the minutes aren't extra accurate (just a sleep 60) which can add some error.

regards.

Revision history for this message
In , Vmicho (vmicho) wrote :

Created an attachment (id=264360)
chart of /proc/interrupts

Temporal chart of /proc/interrupts

Revision history for this message
In , Vmicho (vmicho) wrote :

Hello.
I have some good news.
I downloaded/compiled/installed the newest kernel from kernel.org (2.6.28.2) and I'm running on it for some week now without any problem. In fact, there were lots of changes concerning softirqd in the 2.6.28 release.
Maybe an upgrade of the current opensuse kernel (2.6.27.7-9.1) in repositories would fix this for anybody else.

Regards.

Revision history for this message
In , estellnb (estellnb) wrote :

Created an attachment (id=279389)
proc.interrupts

Revision history for this message
In , estellnb (estellnb) wrote :

Created an attachment (id=279390)
var.log.messages

Revision history for this message
In , szotsaki (szotsaki) wrote :
Download full text (3.8 KiB)

Created an attachment (id=279486)
/var/log/messages file

I also suffer from this bug.

uname -ir:
2.6.27.19-3.2-default x86_64

/proc/interrupts:
           CPU0 CPU1
  0: 71832 72828 IO-APIC-edge timer
  1: 5 7 IO-APIC-edge i8042
  8: 1 0 IO-APIC-edge rtc0
  9: 0 1 IO-APIC-fasteoi acpi
 12: 72 64 IO-APIC-edge i8042
 14: 1690 1616 IO-APIC-edge ata_piix
 15: 0 0 IO-APIC-edge ata_piix
 16: 581 168 IO-APIC-fasteoi nvidia
 17: 26663 10586 IO-APIC-fasteoi ata_piix, eth0, b43
 18: 0 0 IO-APIC-fasteoi mmc0
 19: 1 1 IO-APIC-fasteoi ohci1394
 20: 3236 2221 IO-APIC-fasteoi uhci_hcd:usb1, uhci_hcd:usb4, ehci_hcd:usb7
 21: 8821 2932 IO-APIC-fasteoi uhci_hcd:usb2, uhci_hcd:usb5, HDA Intel
 22: 0 0 IO-APIC-fasteoi ehci_hcd:usb3, uhci_hcd:usb6
NMI: 0 0 Non-maskable interrupts
LOC: 53608 57620 Local timer interrupts
RES: 13893 16887 Rescheduling interrupts
CAL: 1241 296 function call interrupts
TLB: 216 229 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
SPU: 0 0 Spurious interrupts
ERR: 0

lspci:
00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c)
00:01.0 PCI bridge: Intel Corporation Mobile PM965/GM965/GL960 PCI Express Root Port (rev 0c)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 4 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f2)
00:1f.0 ISA bridge: Intel Corporation 82801HEM (ICH8M) LPC Interface Controller (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation GeForce 8400M...

Read more...

Revision history for this message
In , szotsaki (szotsaki) wrote :

I can reproduce this bug with Eclipse and Aptana (and probably with Last.fm).

I have an Eclipse installed with Aptana and the latter wants to download some MBs of update. At 21% it stops as Last.fm also does.

Closing Eclipse and Last.fm the CPU usage caused by ksoftirqd/1 decreases to 0-1%.

Revision history for this message
In , Vmicho (vmicho) wrote :

Hi. Did you try to update to the newest kernel (2.6.28.x)? It solved this for me (at least I didn't have any problems so far).

Revision history for this message
In , estellnb (estellnb) wrote :

  In deed kernel 2.6.28-next-20090107-20090107.18-default from http://download.opensuse.org/repositories/Kernel:/linux-next/openSUSE_11.1/ seems to resolve the issue. If I start the self compiled partgui(/usr/sbin/piguicqt) then it will simply return an error with the new kernel while it has always caused a light variant of the 100%-cpu-ksoftirqd bug with the old kernel fortunately not triggering any disk access (which makes things much worse). However linux-next is not an option for me since it does not awake from s2ram at me as pm-suspend.log revealed.

Revision history for this message
In , estellnb (estellnb) wrote :

Created an attachment (id=279654)
erroneous partgui that kann trigger the ksoftirqd bug

  Here I have uploaded an erroneously self-compiled version of partgui that can trigger a light version of the ksoftirqd bug featuring 100% cpu load but no disk access. Note that the cause for the ksoftirqd overload during normal operation will be different from that kind of artificially triggered one (and more severe because of hdd-access overload). To test with it type:
> make install (on a 64bit machine)
> /usr/sbin/piguicqt (as root)

Revision history for this message
In , estellnb (estellnb) wrote :

Created an attachment (id=279655)
proc.interrupts for partgui triggered overhang

Revision history for this message
In , estellnb (estellnb) wrote :

Created an attachment (id=287170)
still far from being resolved

Revision history for this message
In , estellnb (estellnb) wrote :

this time it is a permanent hangup (sometimes it goes away by itself).
occurs on both platforms: i586, x86_64
should perhaps have been a shipment blocker.
why don`t they offer us a downgrade?

Revision history for this message
In , estellnb (estellnb) wrote :

Of what use will the 'next' version be if it comes with its very own set of unacceptable bugs? I believe it should be resolved for OpenSuse11.1.

Revision history for this message
In , Gregkh (gregkh) wrote :

Only Novell is allowed to set the priority.

Revision history for this message
In , Gregkh (gregkh) wrote :

If no one can duplicate this without the nvidia driver loaded, there is not going to be anything that we can do about this.

So, can someone run without the nvidia driver and still see this?

Revision history for this message
In , Vmicho (vmicho) wrote :

Hi
For now I'm using the vanilla 2.6.28.2 for 2 months without problems.

I can eventually try to start with 2.6.27 (original kernel from osuse 11.1, I still have it in grub) and to not use the nvidia driver.
Fairly easy but it will take some time for me (the bug appears after few minutes but sometimes after several houts or a day).

Revision history for this message
In , estellnb (estellnb) wrote :

  Perhaps I forgot to mention that this occurs with the ati radeonhd driver for x86_64 platforms as well. Unfortunately linux-next(2.6.28) is not an option as long as the s2ram problems are not resolved there (Bug 496954) though the issue does not seem to apply to linux-next(2.6.28).
  Has anyone tried to trigger the overload with 2.6.27 kernel and my partgui test compilation?

Revision history for this message
In , estellnb (estellnb) wrote :

.

Revision history for this message
In , Jeffm-novell (jeffm-novell) wrote :

(In reply to comment #16)
> Of what use will the 'next' version be if it comes with its very own set of
> unacceptable bugs? I believe it should be resolved for OpenSuse11.1.

The linux-next kernel isn't an official openSUSE release. The description itself indicates where to report bugs while using it.

If you're comfortable building and testing kernels, I can give you some tips on how to track down the bug more quickly. Once you've identified the upstream fix, then we can backport it to the openSUSE 11.1 kernel.

Revision history for this message
In , estellnb (estellnb) wrote :

Could you give me some advice on how to activate Apparmor for the 2.6.30 kernel provided at ftp.suse.com/pub/projects/kernel/kotd/master? There is still no replacement for the 2.6.27 kernel series which keeps suffering from the ksoftirq-bug! For me 2.6.30 is now working best, better than linux-next (no s2ram) and of course better than 2.6.27.

Revision history for this message
In , Jeffm-novell (jeffm-novell) wrote :

AppArmor hasn't been forward-ported to 2.6.30 yet.

Revision history for this message
In , Jeffm-novell (jeffm-novell) wrote :

AppArmor has since been forward ported to the master kernel and has been available since 11.2 M4. I still haven't been able to reproduce on 11.1.

Revision history for this message
In , Coolo (coolo) wrote :

*** Bug 540550 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Coolo (coolo) wrote :

moving to 11.2 as Elmar sees it there too.

Revision history for this message
In , Camaleón (noelamac) wrote :

*** Bug 543235 has been marked as a duplicate of this bug. ***

Revision history for this message
Arthur (iegik) wrote :

I running Ubuntu 9.10 with GNOME, why it uses some programs starting with 'k'?
ksoftirqd
kthreadd
khelper

$ top

top - 09:54:00 up 1 day, 22:26, 2 users, load average: 3.16, 2.13, 1.58
Tasks: 182 total, 6 running, 175 sleeping, 0 stopped, 1 zombie
Cpu(s): 61.2%us, 4.1%sy, 0.0%ni, 3.6%id, 30.8%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 1024988k total, 1006088k used, 18900k free, 44980k buffers
Swap: 3004112k total, 131216k used, 2872896k free, 432352k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 6535 arturs 20 0 20912 2180 2164 R 99 0.2 1160:50 vino-server
 1380 root 20 0 164m 87m 9.9m R 10 8.8 36:12.31 Xorg
 5063 root 20 0 39192 35m 1172 D 9 3.5 0:02.41 dpkg
 4928 arturs 20 0 219m 36m 23m S 3 3.7 0:02.10 chromium-browse
 3941 root 20 0 62316 36m 23m R 2 3.7 0:17.16 synaptic
 4000 arturs 20 0 317m 100m 17m S 1 10.0 37:44.52 skype
 5101 arturs 20 0 119m 35m 12m R 1 3.6 0:01.82 chromium-browse
 4032 arturs 20 0 2468 1204 884 R 1 0.1 0:00.47 top
 1828 arturs 20 0 163m 17m 7680 S 0 1.8 3:30.72 gnome-panel
    1 root 20 0 2564 1248 912 S 0 0.1 0:01.20 init
    2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
    3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0
    4 root 15 -5 0 0 0 S 0 0.0 0:06.23 ksoftirqd/0
    5 root RT -5 0 0 0 S 0 0.0 0:00.00 watchdog/0
    9 root 15 -5 0 0 0 S 0 0.0 0:06.97 events/0
   11 root 15 -5 0 0 0 S 0 0.0 0:00.00 cpuset
   12 root 15 -5 0 0 0 S 0 0.0 0:00.00 khelper
   13 root 15 -5 0 0 0 S 0 0.0 0:00.00 netns

25 comments hidden view all 105 comments
Revision history for this message
szemy (sz-tomika) wrote :

maybe these files could help

Revision history for this message
szemy (sz-tomika) wrote :
Changed in ubuntu:
status: Invalid → New
Changed in opensuse:
status: Unknown → In Progress
Revision history for this message
In , estellnb (estellnb) wrote :

 Better with 2.6.32.1. However merely time can show whether the problem has gone completely. Perhaps we should mark as resolved and re-open as soon as it is discovered again.

Revision history for this message
David Tombs (dgtombs) wrote :

Probably a kernel issue. If not, kernel folks will be able to tell us better what it really is.

affects: ubuntu → linux (Ubuntu)
Revision history for this message
In , estellnb (estellnb) wrote :

  Wanna mark as resolved since it has not occured for a while now.
However please do have a look at another nasty property of current kernels:
Bug 566391, s2disk fails.
using 2.6.32.3-0.0.15.68cba77-desktop in the meanwhile.

Revision history for this message
Michael Helmling (supermihi) wrote :

I also investigated ksoftirqd eating my CPU time often in the last time, using the current devel version of lucid lynx, 2.6.32-10-generic. dmeg shows messages like this:

[ 177.540004] sr 0:0:0:0: timing out command, waited 60s
[ 297.550005] sr 0:0:0:0: timing out command, waited 120s
[ 387.560006] sr 0:0:0:0: timing out command, waited 90s
[ 447.570015] sr 0:0:0:0: timing out command, waited 60s
[ 507.580005] sr 0:0:0:0: timing out command, waited 60s

After removing an SATA DVD-drive which showed some buggy behavior also during the past few weeks (probably a hardware failure), the ksoftirqd started to behave normal again.

Revision history for this message
Andreas Noteng (andreas-noteng) wrote :

Bobslaede: Is the problem still present in latest development release?

Revision history for this message
Gergely Filip (gfilip) wrote :

On my system(Asus X59SL, Ubuntu stable) the problem is still present.
xxx@xxx:~$ uname -a
Linux xxx 2.6.31-19-generic #56-Ubuntu SMP Thu Jan 28 01:26:53 UTC 2010 i686 GNU/Linux

Revision history for this message
Andreas Noteng (andreas-noteng) wrote :

did you try adding the nohz=off workaround?

Revision history for this message
Gergely Filip (gfilip) wrote :

Andreas: Thank you for the advice. I have tried it but unfortunately the workaround didn't work. The only change I experienced was the absence of the lines like these in dmesg:
[ 1729.565018] sr 2:0:0:0: timing out command, waited 60s
[ 1849.569021] sr 2:0:0:0: timing out command, waited 120s
[ 1849.569047] sr 2:0:0:0: timing out command, waited 120s

May I ask you to suggest me a way to isolate/track down the problem(e.g. removing unnecessary modules)?

Revision history for this message
Andreas Noteng (andreas-noteng) wrote :

Can you please try the same thing in lucid? Use a livecd if you don't want to install development release.
Alternatively you can try the latest mainline kernel: http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.33-rc7/

Looks like a kernel issue, since the above commenters mention different modules as the source of the error.

If changing kernels or upgrading to lucid does not help, please post /var/log/messages and output of lspci, lsusb and lsmod.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Gergely Filip (gfilip) wrote :

Andreas, first of all I thank you for the advices and your time.
1st trial: Lucid live CD. After getting back from suspend mode the whole X locked up. So I changed to console which behaved very weird(It didn't accept commands at all). Hence I couldn't get useable information from logs.
2nd trial: I installed linux-image-2.6.33-020633rc8-generic_2.6.33-020633rc8_i386 kernel and tried the same steps with the result of huge cpu load. Ksoftirqd used up 100% of CPU time. Logs attached to the post.

Revision history for this message
Gergely Filip (gfilip) wrote :
Revision history for this message
Gergely Filip (gfilip) wrote :
Revision history for this message
Gergely Filip (gfilip) wrote :
Revision history for this message
Gergely Filip (gfilip) wrote :
Revision history for this message
Andreas Noteng (andreas-noteng) wrote :

I'm guessing this is either broken hw or broken hw driver. When this happens can you try to keep an eye on /proc/interrupts and see if you can identify an interrupt running wild?
( ex: watch -d cat /proc/interrupts )

Revision history for this message
Henrik (glas) wrote :

I also got an asus x59sl and expiring the very same problem, the fan is going full speed and respons is kind of slow.
Tried the workaround and it didnt help.

Revision history for this message
Andreas Noteng (andreas-noteng) wrote :

Gergely Filip: Can you please try again with Lucid Alpha 3, or even better the latest daily? There are reports in the suse bug about this being fixed in 2.6.32..
Thanks

Changed in linux (Ubuntu):
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Cam Cope (ccope) wrote :

I've got the following in top under lucid latest:
  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    4 root 20 0 0 0 0 R 98 0.0 1803:21 ksoftirqd/0

watching /proc/interrupts isn't very informative. here's what it shows:
# cat /proc/interrupts
           CPU0 CPU1
  0: 1846 624 IO-APIC-edge timer
  1: 103937 21184 IO-APIC-edge i8042
  6: 1009980 1094791 IO-APIC-edge
  8: 1 0 IO-APIC-edge rtc0
  9: 243889 51 IO-APIC-fasteoi acpi
 12: 10631504 73 IO-APIC-edge i8042
 16: 16935518 15340624 IO-APIC-fasteoi uhci_hcd:usb3, nvidia
 18: 30 0 IO-APIC-fasteoi uhci_hcd:usb8, jmb38x_ms:slot0, ohci1394, mmc0
 19: 6179955 3153 IO-APIC-fasteoi ata_piix, ata_piix, ehci_hcd:usb1, uhci_hcd:usb5, uhci_hcd:usb7
 21: 0 0 IO-APIC-fasteoi uhci_hcd:usb4
 22: 336170 149 IO-APIC-fasteoi HDA Intel
 23: 1 7571 IO-APIC-fasteoi ehci_hcd:usb2, uhci_hcd:usb6
 30: 1505 4805 PCI-MSI-edge eth0
NMI: 0 0 Non-maskable interrupts
LOC: 212005467 154165209 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
PND: 0 0 Performance pending work
RES: 86606397 101594398 Rescheduling interrupts
CAL: 14578 7770 Function call interrupts
TLB: 1927732 1460655 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 1158 1151 Machine check polls

Revision history for this message
Andreas Noteng (andreas-noteng) wrote :

Can anybody experiencing this problem please test the latest mainline kernel to check if this is fixed upstreamed as indicated in the linked suse bug?
http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/

Revision history for this message
piedramania (piedramania) wrote :

I'm having the same problem with the live cd of Kubuntu 10.04 beta 2
ksoftirqd/0 uses up to 100 % of one cpu core.

More over, ubiquity hangs execution when doing the disk scanning. Progress bar stops at 47%, and never gets to next step.

Revision history for this message
Andreas Noteng (andreas-noteng) wrote :

piedrama: Have you experienced this problem in any older versions of Ubuntu? Can you please test the latest mainline kernel? see comment #84.

tags: added: needs-upstream-testing
Revision history for this message
Andreas Noteng (andreas-noteng) wrote :

Can everybody still experiencing this bug please file a separate report with the following command: "apport-bug -p linux" Also, please test the latest mainline kernel: http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/
Thank you all for your help making Ubuntu better by reporting bugs.

Revision history for this message
piedramania (piedramania) wrote :

Actually it only happens when using the live cd. I remember when I installed 9.10 happening the same.
It only happens when I install kubuntu using the harddisk method described in https://help.ubuntu.com/community/Installation/FromLinux, and when the SATA cdrom unit is connected, although there's no cd/dvd on it. If I disconnect the CD/DVD unit, the ksoftirqd process comes to normal.

Since it happens when booting from the live cd, I can't tryout with other kernel, since this would require installing or rebooting.
Once kubuntu is installed, the problem disappears.

Revision history for this message
In , Vmicho (vmicho) wrote :

Well it is really embarassing, but this bug persist also in newest opensuse 11.3 wth kernel 2.6.34.7.

My current uname -a:
Linux linux-wew7.site 2.6.34.7-0.5-desktop #1 SMP PREEMPT 2010-10-25 08:40:12 +0200 i686 i686 i386 GNU/Linux

Revision history for this message
In , estellnb (estellnb) wrote :

Things have actually already improved for many users!
The problem luckily didn`t plague me in the last time.

Michael, what kind of ksoftirqd problem was there?
CPU-usage only, or with massive disk access and a totally irresponsive system?
100% CPU-usage of both CPUs or only of one?
Did the problem go by itself or was a reboot the only escape?
How often and by what frequency did it occur so far?
What kind of system are you using: hardware, modules - perhaps someone can tell us what to look at.

Revision history for this message
In , estellnb (estellnb) wrote :

Ouh; oops! The problem just hasn`t occurred at me because I was using the clocksource=jiffies boot option. However this isn`t ideal.

Changed in opensuse:
importance: Unknown → Critical
status: In Progress → Confirmed
Revision history for this message
In , Jeffm-novell (jeffm-novell) wrote :

Bumping product to 11.3 since it still exists. I'm tossing this one back into the open bug queue because it's not my area of expertise.

Revision history for this message
In , estellnb (estellnb) wrote :

Created an attachment (id=409384)
clcoksource=jiffies, 2.6.37-8.99.14-desktop, 10x a 1s + stacktraces

  Help! Now not even clocksource=jiffies can help. I just got a 100% 2core CPU usage on a 2.6.37-8.99.14.138eeaa-desktop kernel. A short while after the snapshots (/proc/interrupts + stackdumps) were taken massive disk access followed.

** novelty ** The first time for the ksoftirqd 100% cpu usage problem several stack dumps were taken (by Alt-PrnScr-L) to let you see in which execution state the CPU was. So just have a look at this.

Revision history for this message
In , Bphilips (bphilips) wrote :

Can you please try the Kernel of the Day?
 http://en.opensuse.org/openSUSE:Kernel_of_the_day

If it still happens we should report it upstream so that it can get upstream attention. Also can you attach the output from `hwinfo --all` to this bug?

Thanks, Brandon

Changed in opensuse:
status: Confirmed → Incomplete
Revision history for this message
In , Jslaby (jslaby) wrote :

Created an attachment (id=415179)
tasklet debug patch

(In reply to comment #54)
> ** novelty ** The first time for the ksoftirqd 100% cpu usage problem several
> stack dumps were taken (by Alt-PrnScr-L) to let you see in which execution
> state the CPU was. So just have a look at this.

The 2.6.37 traces are useless. The 2.6.34.7 ones are helpful though. Also /proc/softirq clearly shows that some kind of shit schedules a tasklet way too often.

I'm attaching a patch to track that down. Also I'm building a kernel to test and it will appear at:
http://labs.suse.cz/jslaby/bug-465039

Watch for tasklet_action in the logs when this happens. Maybe there will be false positives. Then I'll increase the limit. Let's see.

Revision history for this message
In , Jslaby (jslaby) wrote :

Created an attachment (id=415180)
tasklet debug patch

s/time_after/time_before/ indeed. Rebuilding.

Revision history for this message
In , Bphilips (bphilips) wrote :

Michal- Can you please test Jiri's Kernel?

Revision history for this message
In , Jslaby (jslaby) wrote :

(In reply to comment #58)
> Michal- Can you please test Jiri's Kernel?

Or maybe Elmar?

Revision history for this message
In , estellnb (estellnb) wrote :

  Well, this is nowadays increasingly hard to test. I may run the patched kernel for three month without actually being able to tell whether the ksoftirqd bug has vanished because it occurs so scaresly and inordinately. Unfortunately I have currently been away and thus was not able to test. What we need is something that can trigger the ksoftirqd bug.
  Michael, could you try to run partgui as provided by attachement 5 "erroneous partgui that kann trigger the ksoftirqd bug ". Then let us see if we still can trigger it.

Revision history for this message
In , Vmicho (vmicho) wrote :

(In reply to comment #59)
> (In reply to comment #58)
> > Michal- Can you please test Jiri's Kernel?
>
> Or maybe Elmar?

I'll try to find some time for it.
Even for me it was hard to reproduce. But it occured for me at least once on suse 11.3.

From my observations, it can happen under higher and long-lasting network load. More precisely it happened when I left running Azureus for several hours (with dektop locked), but it also happened when I was working on computer.

Revision history for this message
Brad Figg (brad-figg) wrote : Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
Revision history for this message
In , Gregkh-n (gregkh-n) wrote :

Closing due to lack of response. If this is still an issue, please reopen with the requested information.

Changed in opensuse:
status: Incomplete → Unknown
Revision history for this message
David de Beer (daviddebeer) wrote : Invitation to connect on LinkedIn

LinkedIn
------------

Bug,

I'd like to add you to my professional network on LinkedIn.

- David

David Beer
Unix Administrator at Telkom
Johannesburg Area, South Africa

Confirm that you know David Beer:
https://www.linkedin.com/e/-nfzhyb-hmua8g4m-4h/isd/17335899320/hwkQqDCg/?hs=false&tok=19Wes0KQ0Is5Y1

--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/-nfzhyb-hmua8g4m-4h/7eJ8EbCCczjBHwO3PaxO18LCFk4Dq5m131frru-/goo/183461%40bugs%2Elaunchpad%2Enet/20061/I5747380284_1/?hs=false&tok=2CW7ulLWoIs5Y1

(c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.

Displaying first 40 and last 40 comments. View all 105 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.