Comment 8 for bug 791850

Revision history for this message
Stefan Bader (smb) wrote : Re: oneiric cluster compute instances do not boot

I hope the crash data is not misleading, though it looks like an explanation for the situation. Looking at two dumps, there is one cpu showing activity in both. And in both cases the backtrace includes the following:

 #0 [ffff8805abcfdd10] schedule at ffffffff815f9fd2
 #1 [ffff8805abcfdd38] up at ffffffff810869a2
 #2 [ffff8805abcfdd48] __assign_irq_vector at ffffffff810291f4
 #3 [ffff8805abcfde18] set_mtrr at ffffffff81022a74
 #4 [ffff8805abcfdea8] mtrr_aps_init at ffffffff81023389
 #5 [ffff8805abcfdeb8] native_smp_cpus_done at ffffffff81cf4283
 #6 [ffff8805abcfdee8] smp_init at ffffffff81d02b2f
 #7 [ffff8805abcfdf18] kernel_init at ffffffff81ce6cc9
 #8 [ffff8805abcfdf48] device_not_available at ffffffff816057a4

The thing that is unclear to me is between #4 and #3. The BP is in set_mtrr to initilialize the APs, preempt should be disabled and the address (0xffffffff81022a74) does match up with the code that waits on the BP for all APs to announce that they started the rendevouz handler. And right then something interrupts the BP (some apic init code?) which then blocks on something else which unlikely will happen as the APs would wait for the BP to go on...