Phenom kernel 2.6.24-8-server BUG: soft lockup - CPU#3 stuck for 11s! [events/3:18]

Bug #197252 reported by Laurent GUERBY
10
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: linux-image-server

Hi, I just built a Phenom 9500 quad 4GB RAM system based on ASUS M3A32 MVP Deluxe wifi motherboard (latest BIOS version flashed 0801), installed hardy and when I try to do something some processes get stuck and cannot be killed, after checking in /var/log/kern.log I see lots of:

Mar 1 12:51:22 gcc04 kernel: [ 3085.025358] BUG: soft lockup - CPU#3 stuck for 11s! [events/3:18]
Mar 1 12:51:22 gcc04 kernel: [ 3085.025402] CPU 3:
Mar 1 12:51:22 gcc04 kernel: [ 3085.025403] Modules linked in: it87 hwmon_vid i2c_dev tun iptable_filter ip_tables x_tables sbp2 parport_pc lp parport loop ipv6 lmpcm_usb pata_atiixp snd_hda_intel snd_pcm snd_timer snd_page_alloc sky2 snd_hwdep snd serio_raw button soundcore psmouse atiixp pcspkr i2c_piix4 i2c_core ide_core shpchp pci_hotplug evdev ext3 jbd mbcache sg sd_mod usbhid hid pata_marvell pata_acpi floppy ata_generic ahci libata ohci1394 ieee1394 ehci_hcd scsi_mod ohci_hcd usbcore ssb thermal processor fan fuse
Mar 1 12:51:22 gcc04 kernel: [ 3085.025440] Pid: 18, comm: events/3 Not tainted 2.6.24-8-server #1
Mar 1 12:51:22 gcc04 kernel: [ 3085.025442] RIP: 0010:[__smp_call_function_mask+0xa2/0xf0] [__smp_call_function_mask+0xa2/0xf0] __smp_call_function_mask+0xa2/0xf0
Mar 1 12:51:22 gcc04 kernel: [ 3085.025446] RSP: 0000:ffff810129123de0 EFLAGS: 00000202
Mar 1 12:51:22 gcc04 kernel: [ 3085.025448] RAX: 00000000000008fc RBX: 0000000000000007 RCX: 0000000000000001
Mar 1 12:51:22 gcc04 kernel: [ 3085.025450] RDX: 00000000000000fc RSI: 00000000000000fc RDI: 0000000000000007
Mar 1 12:51:22 gcc04 kernel: [ 3085.025452] RBP: ffff810001048c00 R08: ffff810129122000 R09: ffff810001042810
Mar 1 12:51:22 gcc04 kernel: [ 3085.025454] R10: ffff810001048c60 R11: 0000000000000000 R12: ffff810129063108
Mar 1 12:51:22 gcc04 kernel: [ 3085.025456] R13: ffff810001042800 R14: ffff810001048c60 R15: ffffffff80231ff5
Mar 1 12:51:22 gcc04 kernel: [ 3085.025458] FS: 00002b394d5d46e0(0000) GS:ffff81012b801f00(0000) knlGS:0000000000000000
Mar 1 12:51:22 gcc04 kernel: [ 3085.025460] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Mar 1 12:51:22 gcc04 kernel: [ 3085.025462] CR2: 00000000016b7a48 CR3: 0000000000201000 CR4: 00000000000006e0
Mar 1 12:51:22 gcc04 kernel: [ 3085.025464] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 1 12:51:22 gcc04 kernel: [ 3085.025466] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 1 12:51:22 gcc04 kernel: [ 3085.025467]
Mar 1 12:51:22 gcc04 kernel: [ 3085.025467] Call Trace:
Mar 1 12:51:22 gcc04 kernel: [ 3085.025470] [mcheck_check_cpu+0x0/0x40] mcheck_check_cpu+0x0/0x40
Mar 1 12:51:22 gcc04 kernel: [ 3085.025474] [mcheck_check_cpu+0x0/0x40] mcheck_check_cpu+0x0/0x40
Mar 1 12:51:22 gcc04 kernel: [ 3085.025478] [smp_call_function_mask+0x46/0xa0] smp_call_function_mask+0x46/0xa0
Mar 1 12:51:22 gcc04 kernel: [ 3085.025481] [mcheck_timer+0x0/0x90] mcheck_timer+0x0/0x90
Mar 1 12:51:22 gcc04 kernel: [ 3085.025484] [mcheck_check_cpu+0x0/0x40] mcheck_check_cpu+0x0/0x40
Mar 1 12:51:22 gcc04 kernel: [ 3085.025486] [on_each_cpu+0x1d/0x40] on_each_cpu+0x1d/0x40
Mar 1 12:51:22 gcc04 kernel: [ 3085.025490] [mcheck_timer+0x19/0x90] mcheck_timer+0x19/0x90
Mar 1 12:51:22 gcc04 kernel: [ 3085.025492] [run_workqueue+0xcc/0x170] run_workqueue+0xcc/0x170
Mar 1 12:51:22 gcc04 kernel: [ 3085.025495] [worker_thread+0x0/0x110] worker_thread+0x0/0x110
Mar 1 12:51:22 gcc04 kernel: [ 3085.025498] [worker_thread+0x0/0x110] worker_thread+0x0/0x110
Mar 1 12:51:22 gcc04 kernel: [ 3085.025501] [worker_thread+0xa3/0x110] worker_thread+0xa3/0x110
Mar 1 12:51:22 gcc04 kernel: [ 3085.025504] [<ffffffff80254130>] autoremove_wake_function+0x0/0x30
Mar 1 12:51:22 gcc04 kernel: [ 3085.025508] [worker_thread+0x0/0x110] worker_thread+0x0/0x110
Mar 1 12:51:22 gcc04 kernel: [ 3085.025511] [worker_thread+0x0/0x110] worker_thread+0x0/0x110
Mar 1 12:51:22 gcc04 kernel: [ 3085.025513] [kthread+0x4b/0x80] kthread+0x4b/0x80
Mar 1 12:51:22 gcc04 kernel: [ 3085.025516] [child_rip+0xa/0x12] child_rip+0xa/0x12
Mar 1 12:51:22 gcc04 kernel: [ 3085.025522] [kthread+0x0/0x80] kthread+0x0/0x80
Mar 1 12:51:22 gcc04 kernel: [ 3085.025525] [child_rip+0x0/0x12] child_rip+0x0/0x12
Mar 1 12:51:22 gcc04 kernel: [ 3085.025528]

System information:

ii linux-image-2.6.24-8-server 2.6.24-8.14 Linux kernel image for version 2.6.24 on x86

root@gcc04:~# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 2
model name : AMD Phenom(tm) 9500 Quad-Core Processor
stepping : 2
cpu MHz : 2206.716
cache size : 512 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
bogomips : 4416.09
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor : 1
vendor_id : AuthenticAMD
cpu family : 16
model : 2
model name : AMD Phenom(tm) 9500 Quad-Core Processor
stepping : 2
cpu MHz : 2206.716
cache size : 512 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
bogomips : 4413.48
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor : 2
vendor_id : AuthenticAMD
cpu family : 16
model : 2
model name : AMD Phenom(tm) 9500 Quad-Core Processor
stepping : 2
cpu MHz : 2206.716
cache size : 512 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
bogomips : 4413.48
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

processor : 3
vendor_id : AuthenticAMD
cpu family : 16
model : 2
model name : AMD Phenom(tm) 9500 Quad-Core Processor
stepping : 2
cpu MHz : 2206.716
cache size : 512 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
bogomips : 4413.48
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

# lspci
00:00.0 Host bridge: ATI Technologies Inc RD790 Northbridge only dual slot PCI-e_GFX and HT3 K8 part
00:02.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (external gfx0 port A)
00:04.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express gpp port A)
00:06.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express gpp port C)
00:07.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express gpp port D)
00:09.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express gpp port E)
00:12.0 SATA controller: ATI Technologies Inc SB600 Non-Raid-5 SATA
00:13.0 USB Controller: ATI Technologies Inc SB600 USB (OHCI0)
00:13.1 USB Controller: ATI Technologies Inc SB600 USB (OHCI1)
00:13.2 USB Controller: ATI Technologies Inc SB600 USB (OHCI2)
00:13.3 USB Controller: ATI Technologies Inc SB600 USB (OHCI3)
00:13.4 USB Controller: ATI Technologies Inc SB600 USB (OHCI4)
00:13.5 USB Controller: ATI Technologies Inc SB600 USB Controller (EHCI)
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 14)
00:14.1 IDE interface: ATI Technologies Inc SB600 IDE
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia
00:14.3 ISA bridge: ATI Technologies Inc SB600 PCI to LPC Bridge
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h [Opteron, Athlon64, Sempron] Link Control
01:00.0 VGA compatible controller: ATI Technologies Inc RV515 [Radeon X1600]
01:00.1 Display controller: ATI Technologies Inc Unknown device 7160
02:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6121 SATA II Controller (rev b1)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12)
04:00.0 IDE interface: Marvell Technology Group Ltd. 88SE6121 SATA II Controller (rev b2)
05:00.0 Ethernet controller: Atheros Communications, Inc. AR242x 802.11abg Wireless PCI Express Adapter (rev 01)
06:08.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 70)

Laurent

Tags: cft-2.6.27
Revision history for this message
Robert (ubuntu-10-rmn30) wrote :

Were you using the nvidia driver? I recently observed a similar soft lock. The stack trace appears to come from the nvidia kernel module. Possibly related also to compiz. I have Intel CoreDuo in Dell D820 laptop.

dmesg output with stack trace attached.

Revision history for this message
Laurent GUERBY (laurent-guerby) wrote :

No X on the machine.

With a newer BIOS I was able to get a stable system with debian + upgraded kernel:

http://lkml.org/lkml/2008/4/12/27

But on the same machine I installed ubuntu 8.04 LTS last friday and the machine is not stable, I get lockups within one hour of stress testing.

I just reported it to canonical using my support contract.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Laurent,

Just a few things to ask you to try here. Can you comment if booting with "pci=nomsi" helps at all or not. Additionally, the kernel team is planning to bump the Ubuntu kernel for the upcoming Intrepid Ibex 8.10 release to a 2.6.27 based kernel. This 2.6.27 kernel should be available for testing with Intrepid's Alpha5 release which is set to come out Thurs Sept 4 . If you could confirm this issue with the 2.6.27 kernel that would be helpful. The following page will be updated when Alpha5 is available for testing - http://www.ubuntu.com/testing . Please let us know your results. Thanks.

Changed in linux:
status: New → Incomplete
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Laurent GUERBY (laurent-guerby) wrote :

2.6.27 and pci=nomsi did not help stability.

For the record, after I changed the motherboard to a MSI K9A2GM-FIH (but kept same RAM/cpu/disk/power supply/case) I successfully stress tested the machine for 10 days without any issue on 2.6.27 intrepid kernel (without specific boot option).

Conclusion: both ASUS M3A32 MVP Deluxe wifi and ASUS M2A-VM likely doesn't support properly some phenom processors even with latest BIOS (the same boards are stable with Athlon X2).

Revision history for this message
Laurent GUERBY (laurent-guerby) wrote :

ASUS problem with phenom

Changed in linux:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.