On running network and I/O traffic on SM15000-XE, "BUG: soft lockup -- CPU#0 stuck for 22s!" seen on dmesg
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
irqbalance (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Precise |
Won't Fix
|
High
|
Unassigned | ||
Quantal |
Fix Released
|
Medium
|
Unassigned |
Bug Description
Ubuntu 12.04 LTS was installed on 2 servers on a SM15000-XE system. Bidirectional iperf traffic was started on 7 nics between the 2 servers. After a few hours, following stack trace was observed on the dmesg. Sluggish network performance was observed after that
[15640.958682] sched: RT throttling activated [16924.585541] ------------[ cut here ]------------
[16924.585549] WARNING: at /build/
> [16924.585567] <IRQ> [<ffffffff81067
> [16924.585625] [<ffffffff81665
---[ end trace 3c74a3d373267b03 ]---
[16941.890574] BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
[16941.963686] Modules linked in: 8021q garp stp nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc msr mac_hid lp parport e1000
[16941.963697] CPU 0
[16941.963698] Modules linked in: 8021q garp stp nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc msr mac_hid lp parport e1000
[16941.963704]
[16941.963706] Pid: 0, comm: swapper/0 Tainted: G W 3.2.0-23-generic #36-Ubuntu SeaMicro Sabre2/Type2 - Board Product Name1
[16941.963710] RIP: 0010:[<
[16941.963718] RSP: 0018:ffff88043f
[16941.963719] RAX: 0000000000000000 RBX: ffff88043fc03c60 RCX: 000000000000000a
[16941.963721] RDX: ffff88040f876680 RSI: 00000000000000cc RDI: ffff880420dab478
[16941.963722] RBP: ffff88043fc03cb0 R08: 0000000000000680 R09: 0000000000000800
[16941.963724] R10: ffff880420dab500 R11: 0000000000000001 R12: ffff88043fc03be8
[16941.963725] R13: ffffffff8166555e R14: ffff88043fc03cb0 R15: ffff88040f876000
[16941.963727] FS: 000000000000000
[16941.963729] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[16941.963730] CR2: 00007f19afb0ad50 CR3: 0000000210999000 CR4: 00000000000406f0
[16941.963732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[16941.963734] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[16941.963735] Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0d020)
[16941.963737] Stack:
[16941.987830] ffff88043fc03ca0 00000632813265a4 ffff880424796090 ffff880421200000
[16941.987834] 0000000000000000 00000000000000ff ffff88040f875840 ffff8804218a7a80
[16941.987837] ffff88043fc03cd0 ffffffff81532424 00000000000000fe ffffc90001877fd0
[16941.987840] Call Trace:
[16942.017153] <IRQ>
[16942.042416] [<ffffffff81532
[16942.042423] [<ffffffffa0003
[16942.042427] [<ffffffff81326
[16942.042430] [<ffffffffa0002
[16942.042434] [<ffffffffa0003
[16942.042438] [<ffffffff81540
[16942.042442] [<ffffffff8106e
[16942.042446] [<ffffffff81033
[16942.042450] [<ffffffff81666
[16942.042453] [<ffffffff81015
[16942.042456] [<ffffffff8106e
[16942.042458] [<ffffffff81667
[16942.042462] [<ffffffff8165c
[16942.042463] <EOI>
[16942.067708] [<ffffffff8108e
[16942.067712] [<ffffffff8109c
[16942.067715] [<ffffffff8109c
[16942.067719] [<ffffffff81012
[16942.067723] [<ffffffff81623
[16942.067728] [<ffffffff81cfb
[16942.067731] [<ffffffff81cfb
[16942.067734] [<ffffffff81cfb
[16942.067737] [<ffffffff81cfb
[16942.067739] Code: df be cc 00 00 00 0f 85 48 01 00 00 40 f6 c7 02 0f 85 56 01 00 00 40 f6 c7 04 0f 85 1c 01 00 00 89 f1 31 c0 c1 e9 03 40 f6 c6 04 <f3> 48 ab 0f 85 f0 00 00 00 40 f6 c6 02 0f 85 ce 00 00 00 83 e6
[16942.301153] Call Trace:
[16942.301156] <IRQ> [<ffffffff81532
[16942.301168] [<ffffffffa0003
[16942.301172] [<ffffffff81326
[16942.301175] [<ffffffffa0002
[16942.301178] [<ffffffffa0003
[16942.301181] [<ffffffff81540
[16942.301186] [<ffffffff8106e
[16942.301189] [<ffffffff81033
[16942.301193] [<ffffffff81666
[16942.301196] [<ffffffff81015
[16942.301199] [<ffffffff8106e
[16942.301201] [<ffffffff81667
[16942.301204] [<ffffffff8165c
[16942.301205] <EOI> [<ffffffff8108e
[16942.301211] [<ffffffff8109c
[16942.301214] [<ffffffff8109c
[16942.301216] [<ffffffff81012
[16942.301220] [<ffffffff81623
[16942.301233] [<ffffffff81cfb
[16942.301236] [<ffffffff81cfb
[16942.301239] [<ffffffff81cfb
[16942.301242] [<ffffffff81cfb
[17102.391716] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17102.391717] Tx Queue <0>
[17102.391718] TDH <eb>
[17102.391719] TDT <d3>
[17102.391719] next_to_use <d3>
[17102.391720] next_to_clean <e5>
[17102.391720] buffer_
[17102.391721] time_stamp <100402f9f>
[17102.391722] next_to_watch <f8>
[17102.391722] jiffies <100403264>
[17102.391723] next_to_
[17134.426481] INFO: task kworker/7:2:5025 blocked for more than 120 seconds.
[17134.509027] "echo 0 > /proc/sys/
[17134.603042] kworker/7:2 D 0000000000000007 0 5025 2 0x00000000
[17134.603047] ffff880413843af0 0000000000000046 0000000000000000 0000000000000000
[17134.603061] ffff880413843fd8 ffff880413843fd8 ffff880413843fd8 0000000000013780
[17134.603066] ffff880420e75bc0 ffff880416ecdbc0 0000000000000000 7fffffffffffffff
[17134.603070] Call Trace:
[17134.603080] [<ffffffff8165a
[17134.603084] [<ffffffff8165a
[17134.603088] [<ffffffff8165a
[17134.603094] [<ffffffff81055
[17134.603099] [<ffffffff8105f
[17134.603102] [<ffffffff8165a
[17134.603108] [<ffffffff81084
[17134.603112] [<ffffffff81082
[17134.603119] [<ffffffff81084
[17134.603123] [<ffffffff81085
[17134.603127] [<ffffffff81085
[17134.603136] [<ffffffffa0000
[17134.603150] [<ffffffffa0004
[17134.603155] [<ffffffffa0006
[17134.603160] [<ffffffffa0006
[17134.603162] [<ffffffff81084
[17134.603165] [<ffffffff81085
[17134.603168] [<ffffffff81085
[17134.603170] [<ffffffff8108a
[17134.603173] [<ffffffff81666
[17134.603176] [<ffffffff8108a
[17134.603178] [<ffffffff81666
[17134.603184] INFO: task kworker/7:1:5691 blocked for more than 120 seconds.
[17134.685722] "echo 0 > /proc/sys/
[17134.779732] kworker/7:1 D 0000000000000007 0 5691 2 0x00000000
[17134.779738] ffff8801d0e29d10 0000000000000046 0000000000000000 0000000000000000
[17134.779745] ffff8801d0e29fd8 ffff8801d0e29fd8 ffff8801d0e29fd8 0000000000013780
[17134.779750] ffff880425775bc0 ffff88020f8d96f0 ffff880425794600 ffff880421205040
[17134.779754] Call Trace:
[17134.779762] [<ffffffff8165a
[17134.779764] [<ffffffff8165b
[17134.779767] [<ffffffff8165a
[17134.779786] [<ffffffffa0005
[17134.779792] [<ffffffffa0005
[17134.779798] [<ffffffff81084
[17134.779802] [<ffffffff81085
[17134.779808] [<ffffffff81085
[17134.779812] [<ffffffff8108a
[17134.779815] [<ffffffff81666
[17134.779817] [<ffffffff8108a
[17134.779822] [<ffffffff81666
[17146.285107] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17146.285108] Tx Queue <0>
[17146.285109] TDH <ed>
[17146.285109] TDT <37>
[17146.285110] next_to_use <37>
[17146.285111] next_to_clean <eb>
[17146.285111] buffer_
[17146.285112] time_stamp <100405c55>
[17146.285112] next_to_watch <ee>
[17146.285113] jiffies <100405d53>
[17146.285113] next_to_
[17148.215140] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17148.215141] Tx Queue <0>
[17148.215142] TDH <a>
[17148.215142] TDT <f2>
[17148.215143] next_to_use <f2>
[17148.215143] next_to_clean <5>
[17148.215144] buffer_
[17148.215144] time_stamp <100405cb2>
[17148.215145] next_to_watch <10>
[17148.215145] jiffies <100405f36>
[17148.215146] next_to_
[17223.041709] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17223.041710] Tx Queue <0>
[17223.041711] TDH <bd>
[17223.041711] TDT <20>
[17223.041712] next_to_use <20>
[17223.041712] next_to_clean <b4>
[17223.041713] buffer_
[17223.041713] time_stamp <10040a73e>
[17223.041714] next_to_watch <be>
[17223.041714] jiffies <10040a866>
[17223.041715] next_to_
[17231.053339] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17231.053340] Tx Queue <0>
[17231.053341] TDH <f5>
[17231.053341] TDT <75>
[17231.053342] next_to_use <75>
[17231.053342] next_to_clean <f1>
[17231.053343] buffer_
[17231.053343] time_stamp <10040af0d>
[17231.053344] next_to_watch <f9>
[17231.053344] jiffies <10040b03b>
[17231.053345] next_to_
[17254.591185] INFO: task kworker/7:2:5025 blocked for more than 120 seconds.
[17254.673724] "echo 0 > /proc/sys/
[17254.767714] kworker/7:2 D 0000000000000007 0 5025 2 0x00000000
INTERRUPT
root@ubuntu ~# cat /proc/interrupts 100% 243KB 242.8KB/s 00:00
> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
> 0: 87 0 0 0 0 0 0 0 IR-IO-APIC-edge timer
> 4: 17735 0 14263 0 30126 0 0 0 IR-IO-APIC-edge serial
> 8: 1 0 0 0 0 0 0 0 IR-IO-APIC-edge rtc0
> 9: 0 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi acpi
> 16: 16604523 0 0 12120410 0 0 0 0 IR-IO-APIC-fasteoi ahci, ahci, ahci, eth1, eth4
> 17: 1678725 0 0 8996499 0 0 0 0 IR-IO-APIC-fasteoi ahci, ahci, ahci, eth0, eth6
> 18: 45881173 0 0 0 0 0 0 0 IR-IO-APIC-fasteoi ahci, ahci, eth3, eth7
> 19: 59172483 0 0 20836 0 0 0 0 IR-IO-APIC-fasteoi eth2, eth5
> 40: 0 0 0 0 0 0 0 0 DMAR_MSI-edge dmar0
> 41: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> 42: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> 43: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> 44: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> 45: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> 46: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> 47: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> 48: 0 0 0 0 0 0 0 0 IR-PCI-MSI-edge PCIe PME
> NMI: 5647 157 1546 1659 1984 356 997 572 Non-maskable interrupts
> LOC: 964914 979936 1409782 582208 1756677 1311067 2095033 1656838 Local timer interrupts
> SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
> PMI: 5647 157 1546 1659 1984 356 997 572 Performance monitoring interrupts
> IWI: 0 0 0 0 0 0 0 0 IRQ work interrupts
> RES: 147237 712096 22528993 144356 10341760 488304 3966861 554994 Rescheduling interrupts
> CAL: 174 221 212 149 223 212 205 218 Function call interrupts
> TLB: 153 210 252 1558 299 517 361 682 TLB shootdowns
> TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts
> THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
> MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
> MCP: 45 45 45 45 45 45 45 45 Machine check polls
information type: | Public → Private |
information type: | Private → Public |
Changed in ubuntu: | |
assignee: | nobody → Samantha Jian-Pielak (samantha-jian) |
Changed in ubuntu: | |
assignee: | Samantha Jian-Pielak (samantha-jian) → nobody |
Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https:/ /wiki.ubuntu. com/Bugs/ FindRightPackag e. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.
To change the source package that this bug is filed about visit https:/ /bugs.launchpad .net/ubuntu/ +bug/1183135/ +editstatus and add the package name in the text box next to the word Package.
[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]