Thanks for your investigation. Sorry for the delay, but finally I managed to reboot the compute nodes with the "notsc" kernel parameter. I also disabled the qemu-nbd workaround.
Once that was done, it didn't take long for a node to crash, which would indicate that notsc didn't fix the problem. However, the host got stuck and didn't dump anything. OK then. It happened a second time a few minutes after on a different host, so I thought I'd investigate this more.
I'm a bit worried about the following line :
[ 0.000000] tsc: Kernel compiled with CONFIG_X86_TSC, cannot disable TSC completely
which is also displayed during "regular" boots (eg not through kexec).
I guess I can remove "notsc" from the kexec command line, but this will take additional time. I thought I'd let you know the current status in the meantime.
Hi Dan,
Thanks for your investigation. Sorry for the delay, but finally I managed to reboot the compute nodes with the "notsc" kernel parameter. I also disabled the qemu-nbd workaround.
Once that was done, it didn't take long for a node to crash, which would indicate that notsc didn't fix the problem. However, the host got stuck and didn't dump anything. OK then. It happened a second time a few minutes after on a different host, so I thought I'd investigate this more.
It turns out, the kernel booted through kexec fails booting probably because of the notsc option : https:/ /pastebin. canonical. com/146714/
I'm a bit worried about the following line :
[ 0.000000] tsc: Kernel compiled with CONFIG_X86_TSC, cannot disable TSC completely
which is also displayed during "regular" boots (eg not through kexec).
I guess I can remove "notsc" from the kexec command line, but this will take additional time. I thought I'd let you know the current status in the meantime.
Cheers