Comment 95 for bug 1505564

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello Chris,

Could you clarify the following statement:

"""
So, step 2 was to add "nox2apic intremap=off" to the DL385-G7s. I added it to only one of them initially. That machine lasted 9 days before we had another kernel panic ("NMI watchdog: BUG: soft lockup - CPU#27 stuck for 23s! [migration/27:200]"), but after the panic it seems to have settled back down again (without any reboot).
"""

So, I'm not sure if you are "panic'ing on hung tasks" (sysctl option). The way I read this is that the machine showed a soft lockup BUT the kernel did not crash and recovered after some time. This might indicate that, after workload was reduced, the kernel could get back on track with migration kthread. Could you clarify this ?

You did right.

< G8 cmdline == "nox2apic intremap=off"
>= G8 cmdline == "intremap=no_x2apic_optout"

So, if the kernel (G7) had a soft lockup warning but had no "hard lockups" (race conditions), then we are good. Judging by the G8, it looks like that after the change it is still running. Could you clarify if you changed the c-states (min and packing) firmware options ?

I would recommend you staying in 3.13 if they show stable after firmware version/options and cmdline were changed. This way we have a way to "compare" things. As long as they don't have HARD lockups, I think we will be good.

Let me know if you need any other clarification.

Cheers!

Rafael Tinoco