SMP kernel fails to boot most of the time

Bug #515270 reported by Trey Blancher
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

I've had this problem for a long time now, on 8.10, 9.04, and now 9.10, and it hasn't been solved. As you can see by my output, I have a quad-core AMD processor. However, most of the time it fails to boot. I've turned off the quiet kernel option, so I can see what happens, and if it tries to boot more that one core (but less than all four) it fails to boot. I get a "not responding" message when booting, and after I get the kernel message along the lines of "booting x cores (yyyyy.y BogoMIPS)" the boot hangs (where 1 < x < 4). I will try to capture the exact output, but since I don't have access to a shell when this happens, I can't easily grab it.

The data collected on this machine for this ticket was created on a clean, one-core boot. I enabled these options to the kernel: noapic nolapic acpi=noirq pci=noirq, but that disabled SMP (and hence, I was able to boot). When I upgraded this system from Jaunty to Karmic, it booted all four cores fine, so I know it can work. I'm pretty sure it's a hardware bug, due to the fact that this motherboard manufacturer doesn't explicitly advertise that this motherboard supports this processor. My own fault for not paying close attention to that. I'm looking for a workaround, perhaps the kernel boot line magic to get this working every time.

Thanks in advance!

ProblemType: Bug
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: trey 2768 F.... pulseaudio
 /dev/snd/controlC1: trey 2768 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'NVidia'/'HDA NVidia at 0xfe020000 irq 11'
   Mixer name : 'Nvidia MCP78 HDMI'
   Components : 'HDA:10ec0888,12970000,00100001 HDA:10de0002,10de0101,00100000'
   Controls : 40
   Simple ctrls : 20
Card1.Amixer.info:
 Card hw:1 'U0x46d0x9a5'/'USB Device 0x46d:0x9a5 at usb-0000:00:02.1-5, high speed'
   Mixer name : 'USB Mixer'
   Components : 'USB046d:09a5'
   Controls : 2
   Simple ctrls : 1
Card1.Amixer.values:
 Simple mixer control 'Mic',0
   Capabilities: cvolume cvolume-joined cswitch cswitch-joined
   Capture channels: Mono
   Limits: Capture 0 - 3072
   Mono: Capture 0 [0%] [23.00dB] [on]
Date: Sun Jan 31 15:15:03 2010
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=d51f7271-39df-45bb-a258-264561056d60
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
MachineType: Shuttle Inc SN78S
NonfreeKernelModules: nvidia
Package: linux-image-2.6.31-17-generic 2.6.31-17.54
ProcCmdLine: root=UUID=ade7513a-e1ab-438a-a598-fc9eaa830584 ro noapic nolapic acpi=noirq pci=noirq splash
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/usr/bin/zsh
ProcVersionSignature: Ubuntu 2.6.31-17.54-generic
RelatedPackageVersions:
 linux-backports-modules-2.6.31-17-generic N/A
 linux-firmware 1.25
RfKill:

SourcePackage: linux
Uname: Linux 2.6.31-17-generic x86_64
WpaSupplicantLog:

XsessionErrors:
 (gnome-settings-daemon:2797): GLib-CRITICAL **: g_propagate_error: assertion `src != NULL' failed
 (polkit-gnome-authentication-agent-1:2837): GLib-CRITICAL **: g_once_init_leave: assertion `initialization_value != 0' failed
 (nautilus:2826): Eel-CRITICAL **: eel_preferences_get_boolean: assertion `preferences_is_initialized ()' failed
dmi.bios.date: 11/06/2008
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: 6.00 PG
dmi.board.name: FN78S
dmi.board.vendor: Shuttle Inc
dmi.board.version: V10
dmi.chassis.type: 3
dmi.chassis.vendor: Shuttle Inc
dmi.chassis.version: H7
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvr6.00PG:bd11/06/2008:svnShuttleInc:pnSN78S:pvrV10:rvnShuttleInc:rnFN78S:rvrV10:cvnShuttleInc:ct3:cvrH7:
dmi.product.name: SN78S
dmi.product.version: V10
dmi.sys.vendor: Shuttle Inc

Revision history for this message
Trey Blancher (ectospasm) wrote :
Revision history for this message
Trey Blancher (ectospasm) wrote :

Here's a (retyped) snippet of the kernel boot log when it fails to activate all four cores. This is incomplete, and it's just what remains on the screen when it hangs on boot:

[ 0.010000] CPU 1/0x1 -> Node 0
[ 0.010000] CPU: Physical Processor ID: 0
[ 0.010000] CPU: Processor Core ID: 1
[ 0.010000] mce: CPU supports 6 MCE banks
[ 0.010000] x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
[ 0.180554] CPU1: AMD Phenom(tm) 9950 Quad-Core Processor stepping 03
[ 0.182409] checking TSC synchronization [CPU#0 -> CPU#1]: passed.
[ 0.190055] Booting processor 2 APIC 0x2 ip 0x6000
[ 0.010000] Initializing CPU#2
[ 0.010000] Calibrating delay using timer specific routine.. 5174.78 BogoMIPS (lpj=25873929)
[ 0.010000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 0.010000] CPU: L2 Cache: 512K (64 bytes/line)
[ 0.010000] CPU: 2/0x2 -> Node 0
[ 0.010000] CPU: Physical Processor ID: 0
[ 0.010000] CPU: Processor Core ID: 3
[ 0.010000] mce: CPU supports 6 MCE banks
[ 0.010000] x86 PAT enabled: cpu 2, old 0x7040600060406, new 0x7010600070106
[ 0.350647] CPU2: AMD Phenom(tm) 9950 Quad-Core Processor stepping 03
[ 0.351196] checking TSC synchronization [CPU#0 -> CPU#2]: passed.
[ 0.360051] Booting processor 3 APIC 0x3 ip 0x6000
[ 5.804396] Not responding
[ 5.804518] Brought up 3 CPUs
[ 5.804576] Total of 3 processors activated (15550.23 BogoMIPS)

At this point, the system just hangs. It never continues beyond this. Sometimes, I get the dreaded "Not responding" message on the second core, most of the time when that happens it continues to boot in single core mode. If it stops on the third or fourth cores, it never boots. If I don't get the "Not responding" message, it boots rather quickly, and all four cores are brought up.

description: updated
Revision history for this message
Trey Blancher (ectospasm) wrote :

I upgraded the BIOS to the latest offered by Shuttle. I thought this might have fixed it, because some Phenom II micro code was added. I've only tried one boot, and it didn't work. I'll do some more testing in a moment.

Revision history for this message
Trey Blancher (ectospasm) wrote :

Still nothing. It's as if Ubuntu just flat out refuses to boot the fourth core. I booted into Windows XP 32bit, just to make sure it could boot all four, and both Taskmanager and mprime showed four cores. I let mprime run for about five to ten minutes, to make sure each core got warmed up, then booted into Ubuntu. It still said,"not responding," and halted after loading three cores. Next thing I will try is a vanilla kernel, and see if that will boot it.

Revision history for this message
Trey Blancher (ectospasm) wrote :

Took me longer than it should have to determine the correct method of compiling a kernel for Ubuntu. I found this howto

http://blog.avirtualhome.com/2009/11/03/how-to-compile-a-kernel-for-ubuntu-karmic/

...which did a very good job. Now, the only thing I changed was setting the processor type from generic x86_64 to the Opteron/Athlon64/.../K8 option. Once the kernel was built, I installed it, and booted. First boot was a success, it booted all four cores. But the success ended there, because subsequent reboots (to test and see if this positive result was repeatable) showed the old failure. It still looks like it gets stuck not responding on the third or fourth core, and thus fails to boot. And I'm still at a loss as to why it seems to work when I make a change, but the fix isn't repeatable.

I'm wondering if there's a CPU microcode that can be applied to fix this? Everything Ubuntu seems to have relates to IA32/IA64/Intel64 microcode with no mention of non-Intel support (microcode.ctl even reports that it's not an Intel chip, and does nothing). AMD has apparently made microcode available, according to this page:

http://www.amd64.org/support/microcode.html

That article is light on details, so I don't know how to load the microcode it links to.

This may all be a moot point come this weekend, since I'm reprovisioning this machine with Windows 7, to be a media center PC. Now, I'm a Linux guy at heart, and this issue isn't keeping me from installing Mythbuntu on it. You can thank Netflix for not natively supporting Linux clients for that. I intend to fully revisit the issue when I finally decide to put Mythbuntu on here. Can someone please at least acknowledge that this issue has been looked at, so I don't feel like I'm shouting in a vacuum?

Revision history for this message
Trey Blancher (ectospasm) wrote :

The last message I tried to post over an hour ago, but I was having connection issues. I've done some more tinkering, and setting maxcpus=3 has allowed two successive boots to boot normally, with three cores. This leads me to think that it is indeed a hardware problem. I will be contacting AMD about this tomorrow if I get a chance.

Revision history for this message
Trey Blancher (ectospasm) wrote :

I think I finally have some success. After I did the BIOS update, I forgot to load optimized defaults. Once I did that, I've been able to boot twice in a row with all four cores. BIOS updates are no joke, I shouldn't have neglected them for so long.

Feel free to close this bug as resolved. (-;

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Trey,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 515270

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Trey Blancher (ectospasm) wrote :

I have already reimaged this machine for another purpose, I don't have the time to reprovision it again with Ubuntu. You may close this as incomplete, and I'll revisit it again when I get a chance to reinstall Ubuntu on this machine. Of course, this may not be for some time, so I will just have to create a new bug then.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.