Comment 19 for bug 1032550

Revision history for this message
Peter Petrakis (peter-petrakis) wrote :

Hi Ronald,

Sorry I haven't been timely, this is the best I can do with community level support
 If kdump isn't launching even in the most trivial case then you have to start from zero.

is crashkernel even configured?
 * grep crash /proc/crashkernel

How much memory do you have, could you assign more memory to the crash kernel?
 * http://lxr.linux.no/linux+v3.7.1/Documentation/kdump/kdump.txt#L270
 * 256MB would be preferable

Can you even kexec at all?
 * kexec -p # loads the panic kernel, man kexec

If you boot your system with maxcpus=1 (I think that's it) and pretend you're
a uniprocessor system, will kexec load?

Can you attach a serial console to your machine and post the output?

In /etc/init.d/kdump
        # Append kdump_needed for initramfs to know what to do, and add
        # maxcpus=1 to keep things sane.
        APPEND="$APPEND kdump_needed maxcpus=1 irqpoll reset_devices"

Start adjusting these variables, like remove 'reset_devices', reload the
kexec kernel (service kdump restart), and systematically remove variables
(except kdump_needed) noting the change in the kernel output.

Is this an enterprise server with an NMI button? If you configure "panic on nmi"
pressing that button, that will definitely change the base variables used to
launch kexec.

Folks thought Stratus was a bit overkill, having a complete mirror of CPU/Memory
operating in lockstep for HA. The nice thing about it is if the primary ever did crash,
we would literally hold that unit in stasis, reboot on the other unit, and reap the
dump from it's preserved memory, works 100% and automatic. Be nice to have
right about now.