kernel BUG() triggered under high load

Bug #65542 reported by Per Buer
4
Affects Status Importance Assigned to Milestone
linux-source-2.6.15 (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: linux-image-2.6.15-27-sparc64-smp

Whilst hammering on my brand new T2000 with ab - apachebench - on a T2000 server running lighttpd I triggered the following BUG:

Oct 11 16:02:01 dhcp217 kernel: [ 0.211363] kernel BUG at include/linux/skbuff.h:466!
Oct 11 16:02:01 dhcp217 kernel: [ 0.211384] \|/ ____ \|/
Oct 11 16:02:01 dhcp217 kernel: [ 0.211390] "@'/ .. \`@"
Oct 11 16:02:01 dhcp217 kernel: [ 0.211396] /_| \__/ |_\
Oct 11 16:02:01 dhcp217 kernel: [ 0.211402] \__U_/
Oct 11 16:02:01 dhcp217 kernel: [ 0.211416] events/5(71): Kernel bad sw trap 5 [#1]
Oct 11 16:02:01 dhcp217 kernel: [ 0.211433] TSTATE: 0000000080001605 TPC: 0000000000639694 TNPC: 0000000000639698 Y: 00000000 Not tainted
Oct 11 16:02:01 dhcp217 kernel: [ 0.211466] TPC: <tso_fragment+0x214/0x220>
Oct 11 16:02:01 dhcp217 kernel: [ 0.211481] g0: ffffffffffffffbf g1: 00000000006c8000 g2: 0000000000000001 g3: 0000000000000000
Oct 11 16:02:01 dhcp217 kernel: [ 0.211503] g4: fffff801ff8832a0 g5: fffff80003d0a0c0 g6: fffff801ff8b4000 g7: 0000000000000000
Oct 11 16:02:01 dhcp217 kernel: [ 0.211521] o0: 0000000000000039 o1: 0000000000680410 o2: 00000000000001d2 o3: 0000000000000018
Oct 11 16:02:01 dhcp217 kernel: [ 0.211543] o4: 00000000b488ab82 o5: ffffffffcfffffff sp: fffff801ff8b66b1 ret_pc: 000000000063968c
Oct 11 16:02:01 dhcp217 kernel: [ 0.211565] RPC: <tso_fragment+0x20c/0x220>
Oct 11 16:02:01 dhcp217 kernel: [ 0.211581] l0: 00000000000005a8 l1: fffff801eb084720 l2: fffff801eb954d60 l3: 0000000000000000
Oct 11 16:02:01 dhcp217 kernel: [ 0.211600] l4: 0000000000007530 l5: 00000000c0a800ae l6: 0000000000721800 l7: 00000000a68c0050
Oct 11 16:02:01 dhcp217 kernel: [ 0.211620] i0: 0000000000000100 i1: fffff801e99f4e60 i2: 0000000000000b50 i3: 00000000000005a8
Oct 11 16:02:01 dhcp217 kernel: [ 0.211640] i4: 0000000000000006 i5: 0000000000000005 i6: fffff801ff8b6771 i7: 0000000000639e74
Oct 11 16:02:01 dhcp217 kernel: [ 0.211664] I7: <__tcp_push_pending_frames+0x2d4/0x4e0>
Oct 11 16:02:01 dhcp217 kernel: [ 0.211675] Caller[0000000000639e74]: __tcp_push_pending_frames+0x2d4/0x4e0
Oct 11 16:02:01 dhcp217 kernel: [ 0.211700] Caller[0000000000635424]: tcp_rcv_state_process+0x784/0x12a0
Oct 11 16:02:01 dhcp217 kernel: [ 0.211723] Caller[000000000063e90c]: tcp_v4_do_rcv+0xcc/0x400
Oct 11 16:02:01 dhcp217 kernel: [ 0.211742] Caller[0000000000640328]: tcp_v4_rcv+0xbe8/0xc20
Oct 11 16:02:01 dhcp217 kernel: [ 0.211759] Caller[000000000061fa78]: ip_local_deliver+0x178/0x380
Oct 11 16:02:01 dhcp217 kernel: [ 0.211780] Caller[0000000000620050]: ip_rcv+0x3d0/0x660
Oct 11 16:02:01 dhcp217 kernel: [ 0.211797] Caller[00000000005ff4d0]: netif_receive_skb+0x310/0x420
Oct 11 16:02:01 dhcp217 kernel: [ 0.211819] Caller[00000000100ed024]: e1000_clean_rx_irq_ps+0xc4/0xbe0 [e1000]
Oct 11 16:02:01 dhcp217 kernel: [ 0.211895] Caller[00000000100ecde8]: e1000_clean+0xa8/0x220 [e1000]
Oct 11 16:02:01 dhcp217 kernel: [ 0.211943] Caller[00000000005fd9f4]: net_rx_action+0xb4/0x1c0
Oct 11 16:02:01 dhcp217 kernel: [ 0.211962] Caller[000000000045e47c]: __do_softirq+0x7c/0x120
Oct 11 16:02:01 dhcp217 kernel: [ 0.211981] Caller[000000000045e568]: do_softirq+0x48/0x60
Oct 11 16:02:01 dhcp217 kernel: [ 0.211997] Caller[0000000000433a98]: smp_percpu_timer_interrupt+0xb8/0x160
Oct 11 16:02:01 dhcp217 kernel: [ 0.212025] Caller[00000000004109d4]: tl0_irq14+0x14/0x20
Oct 11 16:02:01 dhcp217 kernel: [ 0.212044] Caller[00000000100e6c50]: e1000_update_stats+0x930/0x980 [e1000]
Oct 11 16:02:01 dhcp217 kernel: [ 0.212094] Caller[00000000100ea2c0]: e1000_watchdog_task+0x60/0x720 [e1000]
Oct 11 16:02:01 dhcp217 kernel: [ 0.212142] Caller[000000000046b32c]: worker_thread+0x14c/0x240
Oct 11 16:02:01 dhcp217 kernel: [ 0.212169] Caller[0000000000470380]: kthread+0xe0/0x100
Oct 11 16:02:01 dhcp217 kernel: [ 0.212190] Caller[0000000000417f90]: kernel_thread+0x30/0x60
Oct 11 16:02:01 dhcp217 kernel: [ 0.212208] Caller[00000000004703b0]: keventd_create_kthread+0x10/0x60
Oct 11 16:02:01 dhcp217 kernel: [ 0.212228] Instruction DUMP: 921021d2 7ff77e1d 90122010 <91d02005> 01000000 01000000 9de3bf40 94102000 9210
0019
Oct 11 16:02:01 dhcp217 kernel: [ 0.212273] Kernel panic - not syncing: Aiee, killing interrupt handler!
Oct 11 16:02:01 dhcp217 kernel: [ 9.198290] <0>Press Stop-A (L1-A) to return to the boot prom

The server is running Ubuntu Dapper 6.06.1. Let me know if you require more information or if I should take the problem upstream.

Tags: kernel-bug
Revision history for this message
Per Buer (perbu) wrote :

I can now reproduce this bug quite easy. Got the same error again. This time with httperf as a client and apache2 (worker) as server. Preceding the BUG where to asserts() which might indicate the same or other errors:

[ 15.699226] KERNEL: assertion (!sk->sk_forward_alloc) failed at net/core/stream.c (279)
[ 16.392399] KERNEL: assertion (!sk->sk_forward_alloc) failed at net/ipv4/af_inet.c (148)
[ 11.779294] kernel BUG at include/linux/skbuff.h:466!
[ 11.779316] \|/ ____ \|/
[ 11.779322] "@'/ .. \`@"
[ 11.779328] /_| \__/ |_\
[ 11.779334] \__U_/
[ 11.779348] apache2(14266): Kernel bad sw trap 5 [#1]
[ 11.779365] TSTATE: 0000000011001600 TPC: 0000000000639694 TNPC: 0000000000639698 Y: 00000000 Not tainted
[ 11.779399] TPC: <tso_fragment+0x214/0x220>
[ 11.779413] g0: 00000000007c1118 g1: 00000000006c8000 g2: 0000000000000001 g3: 0000000000000000
[ 11.779435] g4: fffff801ee1ac0e0 g5: fffff80003d0a0c0 g6: fffff801e7634000 g7: 0000000000000000
[ 11.779453] o0: 0000000000000039 o1: 0000000000680410 o2: 00000000000001d2 o3: 0000000000000018
[ 11.779476] o4: 000000002873e07f o5: ffffffffcfffffff sp: fffff801e7636c61 ret_pc: 000000000063968c
[ 11.779498] RPC: <tso_fragment+0x20c/0x220>
[ 11.779514] l0: 00000000000005a8 l1: fffff801ed25edc0 l2: fffff801eefa80a0 l3: 0000000000000000
[ 11.779534] l4: 0000000000007530 l5: 00000000c0a800ae l6: 0000000000721800 l7: 00000000bdae0050
[ 11.779554] i0: 0000000000000100 i1: fffff801fd23aae0 i2: 0000000000006028 i3: 00000000000005a8
[ 11.779575] i4: 0000000000000017 i5: 0000000000000047 i6: fffff801e7636d21 i7: 0000000000639e74
[ 11.779601] I7: <__tcp_push_pending_frames+0x2d4/0x4e0>
[ 11.779613] Caller[0000000000639e74]: __tcp_push_pending_frames+0x2d4/0x4e0
[ 11.779637] Caller[0000000000636440]: tcp_rcv_established+0x500/0x9e0
[ 11.779660] Caller[000000000063e938]: tcp_v4_do_rcv+0xf8/0x400
[ 11.779679] Caller[0000000000640328]: tcp_v4_rcv+0xbe8/0xc20
[ 11.779696] Caller[000000000061fa78]: ip_local_deliver+0x178/0x380
[ 11.779717] Caller[0000000000620050]: ip_rcv+0x3d0/0x660
[ 11.779734] Caller[00000000005ff4d0]: netif_receive_skb+0x310/0x420
[ 11.779756] Caller[00000000100ed024]: e1000_clean_rx_irq_ps+0xc4/0xbe0 [e1000]
[ 11.779833] Caller[00000000100ecde8]: e1000_clean+0xa8/0x220 [e1000]
[ 11.779880] Caller[00000000005fd9f4]: net_rx_action+0xb4/0x1c0
[ 11.779899] Caller[000000000045e47c]: __do_softirq+0x7c/0x120
[ 11.779918] Caller[000000000045e568]: do_softirq+0x48/0x60
[ 11.779935] Caller[00000000004108d4]: tl0_irq5+0x34/0x40
[ 11.779954] Caller[00000000f7b25744]: 0xf7b25744
[ 11.779984] Instruction DUMP: 921021d2 7ff77e1d 90122010 <91d02005> 01000000 01000000 9de3bf40 94102000 92100019
[ 11.780028] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 13.458275] <0>Press Stop-A (L1-A) to return to the boot prom

Revision history for this message
Per Buer (perbu) wrote :

I upgraded to 2.6.18 from kernel.org using the same .config and the problem went away - both the crashes and the asserts. There is something seriously broken on sparc in the dapper kernel.

Revision history for this message
Fabio Massimo Di Nitto (fabbione) wrote :

Can you please tell me what is the exact setup? are you running everything on the same machine or do you generate the load externally? what kind of load are you placing on the machine? The error you are getting are coming from the tcp stack/e1000 driver, so it is entirely possible that a new kernel fixes this behavior.

I need to know your setup basically and see if i can reproduce it here.

Thanks
Fabio

Changed in linux-source-2.6.15:
assignee: nobody → fabbione
status: Unconfirmed → Needs Info
Changed in linux-source-2.6.15:
assignee: fabbione → phillip-lougher
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

We are closing this bug report because it lacks the information we need to investigate the problem, as described in the previous comments. Please reopen it if you can give us the missing information, and don't hesitate to submit bug reports in the future. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to "New". Thanks again!

Changed in linux-source-2.6.15:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.