At one point I was able to track down a kernel.org post about the root of
the problem, but I can't remember exactly what it was. But I recall it was
due to a mistake on the part of one of the kernel devs.
-W
On Wed, Jul 22, 2009 at 2:36 PM, Ryan Lovett <email address hidden> wrote:
At one point I was able to track down a kernel.org post about the root of
the problem, but I can't remember exactly what it was. But I recall it was
due to a mistake on the part of one of the kernel devs.
-W
On Wed, Jul 22, 2009 at 2:36 PM, Ryan Lovett <email address hidden> wrote:
> On Wed, Jul 22, 2009 at 06:23:01PM -0000, Warren V wrote: /bugs.launchpad .net/bugs/ 245779 _spin_lock+ 0xa/0x10] EFLAGS: 00000286 CPU: 1 0x26/0x690 [bonding] _read_lock_ bh+0x8/ 0x50] _read_lock_ bh+0x8/ 0x20 0x8b/0x5f0] 0x8b/0x5f0 lacpdu_ recv+0x1fa/ 0x240 [bonding] deliver_ finish+ 0xf9/0x210] ip_local_ deliver_ finish+ 0xf9/0x210 finish+ 0xff/0x370] ip_rcv_ finish+ 0xff/0x370 write_space+ 0x12/0xa0] sock_def_ write_space+ 0x12/0xa0 rx_buffers+ 0xab/0x3a0 [e1000] __netdev_ alloc_skb+ 0x22/0x2a80] __netdev_ alloc_skb+ 0x22/0x50 lacpdu_ recv+0x0/ 0x240 [bonding] receive_ skb+0x379/ 0x720] netif_receive_ skb+0x379/ 0x440 rx_irq+ 0x174/0x500 [e1000] rx_irq+ 0x78/0x500 [e1000] rx_irq+ 0x0/0x500 [e1000] 0x5e/0x250 [e1000] action+ 0x12d/0x210] net_rx_ action+ 0x12d/0x210 0x82/0x110] 0x82/0x110 0x55/0x60] 0x55/0x60 0x6d/0x80] group+0x1bd/ 0x760] find_busiest_ group+0x1bd/ 0x760 interrupt+ 0x23/0x28] common_ interrupt+ 0x23/0x28 hw+0x34b/ 0xb50 [e1000] spin_lock+ 0x3/0x10] _spin_lock+0x3/0x10 0x26/0x690 [bonding] base+0x27/ 0x60] lock_timer_ base+0x27/ 0x60 work_timer_ fn+0x0/ 0x20] delayed_ work_timer_ fn+0x0/ 0x20 state_machine_ handler+ 0xf0/0x9b0 [bonding] delayed_ work_on+ 0x7c/0xb0] queue_delayed_ work_on+ 0x7c/0xb0 queue_delayed_ work+0x51/ 0x70] queue_delayed_ work+0x51/ 0x70 state_machine_ handler+ 0x0/0x9b0 [bonding] 0xbf/0x160] run_workqueue+ 0xbf/0x160 thread+ 0x0/0xe0] thread+ 0x0/0xe0 thread+ 0x84/0xe0] thread+ 0x84/0xe0 wake_function+ 0x0/0x40 thread+ 0x0/0xe0] thread+ 0x0/0xe0 thread_ helper+ 0x7/0x10] kernel_ thread_ helper+ 0x7/0x10 ======= ======= ==
> > Upgrade your kernel to 2.6.28. CentOS is now on 2.6.28-128, I noted the
> > problem went away around 2.6.28-92.
> >
> > Ubuntu is stuck with whatever is currently out.
>
> Do you know which patch addressed the issue? If so, the Ubuntu kernel devs
> might be able to backport it to the LTS release.
>
> Ryan
>
> --
> Server 8.04 LTS: soft lockup - CPU#1 stuck for 11s! [bond1:3795] - bond -
> bond0
> https:/
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in “linux” package in Ubuntu: Incomplete
> Status in “linux” package in Debian: Fix Released
>
> Bug description:
> Hi!
> Ubuntu Server 8.04 LTS with all patch and last kernel
> Hardware: HP DL360 G4 Xeon
> Bonding with :
> - bond0 2x1Gb Intel (802.3ad / 4)
> - bond1 8x1Gb Intel (802.3ad / 4)
> Nagios (only nrpe and plugin)
> Heartbeat2 (withour CRM)
> Vlan
>
> Today it crash (after two week uptime from kernel upgrade) with this output
>
> 6640927 firewall 11:46:54 kernel: [431168.944816] BUG: soft lockup - CPU#1
> stuck for 11s! [bond1:3795]
> 6640928 firewall 11:46:54 kernel: [431168.944849]
> 6640929 firewall 11:46:54 kernel: [431168.944853] Pid: 3795, comm: bond1
> Not tainted (2.6.24-19-server #1)
> 6640930 firewall 11:46:54 kernel: [431168.944856] EIP:
> 0060:[ipv6:
> 6640931 firewall 11:46:54 kernel: [431168.944865] EIP is at
> _spin_lock+0xa/0x10
> 6640932 firewall 11:46:54 kernel: [431168.944867] EAX: f749f334 EBX:
> f749f25c ECX: 00000001 EDX: f749f25c
> 6640933 firewall 11:46:54 kernel: [431168.944870] ESI: 00000000 EDI:
> f7ca1000 EBP: f6c35c80 ESP: f6835cc0
> 6640934 firewall 11:46:54 kernel: [431168.944872] DS: 007b ES: 007b FS:
> 00d8 GS: 0000 SS: 0068
> 6640935 firewall 11:46:54 kernel: [431168.944875] CR0: 8005003b CR2:
> b7bfd0a0 CR3: 35908000 CR4: 000006b0
> 6640936 firewall 11:46:54 kernel: [431168.944878] DR0: 00000000 DR1:
> 00000000 DR2: 00000000 DR3: 00000000
> 6640937 firewall 11:46:54 kernel: [431168.944880] DR6: ffff0ff0 DR7:
> 00000400
> 6640938 firewall 11:46:54 kernel: [431168.944887] [<f8b67606>]
> ad_rx_machine+
> 6640939 firewall 11:46:54 kernel: [431168.944899]
> [nf_nat:
> 6640940 firewall 11:46:54 kernel: [431168.944920] [arp_process+
> arp_process+
> 6640941 firewall 11:46:54 kernel: [431168.944930] [<f8b67e6a>]
> bond_3ad_
> 6640942 firewall 11:46:54 kernel: [431168.944946]
> [ip_local_
> 6640943 firewall 11:46:54 kernel: [431168.944955]
> [ip_rcv_
> 6640944 firewall 11:46:54 kernel: [431168.944960]
> [sock_def_
> 6640945 firewall 11:46:54 kernel: [431168.944968] [<f8967a4b>]
> e1000_alloc_
> 6640946 firewall 11:46:54 kernel: [431168.944982] [arp_rcv+0x0/0x140]
> arp_rcv+0x0/0x140
> 6640947 firewall 11:46:54 kernel: [431168.944994]
> [e1000:
> 6640948 firewall 11:46:54 kernel: [431168.945000] [<f8b67c70>]
> bond_3ad_
> 6640949 firewall 11:46:54 kernel: [431168.945011]
> [tg3:netif_
> 6640950 firewall 11:46:54 kernel: [431168.945024] [<f8968474>]
> e1000_clean_
> 6640951 firewall 11:46:54 kernel: [431168.945037] [<f8968378>]
> e1000_clean_
> 6640952 firewall 11:46:54 kernel: [431168.945059] [<f8968300>]
> e1000_clean_
> 6640953 firewall 11:46:54 kernel: [431168.945071] [<f896569e>]
> e1000_clean+
> 6640954 firewall 11:46:54 kernel: [431168.945085]
> [net_rx_
> 6640955 firewall 11:46:54 kernel: [431168.945099] [__do_softirq+
> __do_softirq+
> 6640956 firewall 11:46:54 kernel: [431168.945109] [do_softirq+
> do_softirq+
> 6640957 firewall 11:46:54 kernel: [431168.945113] [irq_exit+
> irq_exit+0x6d/0x80
> 6640958 firewall 11:46:54 kernel: [431168.945117] [do_IRQ+0x40/0x70]
> do_IRQ+0x40/0x70
> 6640959 firewall 11:46:54 kernel: [431168.945121]
> [find_busiest_
> 6640960 firewall 11:46:54 kernel: [431168.945130]
> [common_
> 6640961 firewall 11:46:54 kernel: [431168.945142] [<f897007b>]
> e1000_init_
> 6640962 firewall 11:46:54 kernel: [431168.945156]
> [ipv6:_
> 6640963 firewall 11:46:54 kernel: [431168.945163] [<f8b67606>]
> ad_rx_machine+
> 6640964 firewall 11:46:54 kernel: [431168.945179]
> [lock_timer_
> 6640965 firewall 11:46:54 kernel: [431168.945183]
> [delayed_
> 6640966 firewall 11:46:54 kernel: [431168.945194] [<f8b68290>]
> bond_3ad_
> 6640967 firewall 11:46:54 kernel: [431168.945206]
> [queue_
> 6640968 firewall 11:46:54 kernel: [431168.945214]
> [usbcore:
> 6640969 firewall 11:46:54 kernel: [431168.945221] [<f8b681a0>]
> bond_3ad_
> 6640970 firewall 11:46:54 kernel: [431168.945229]
> [run_workqueue+
> 6640971 firewall 11:46:54 kernel: [431168.945240] [worker_
> worker_
> 6640972 firewall 11:46:54 kernel: [431168.945245] [worker_
> worker_
> 6640973 firewall 11:46:54 kernel: [431168.945251] [<c0145fc0>]
> autoremove_
> 6640974 firewall 11:46:54 kernel: [431168.945260] [worker_
> worker_
> 6640975 firewall 11:46:54 kernel: [431168.945265] [kthread+0x42/0x70]
> kthread+0x42/0x70
> 6640976 firewall 11:46:54 kernel: [431168.945269] [kthread+0x0/0x70]
> kthread+0x0/0x70
> 6640977 firewall 11:46:54 kernel: [431168.945274]
> [kernel_
> 6640978 firewall 11:46:54 kernel: [431168.945284] =======
>
> Can you help me?
>
> Very thanks
>
> ---
> Sim
>