Comment 17 for bug 1796443

Revision history for this message
Jon Grimm (jgrimm) wrote :

Just ran across this bug in LP.

Note: 60c8144afc28 is only masking what the real issue is; it's a real bug, but the reason it's getting hit at all in this specific instance is because of an AMD CPU erratum which is causing spurious MCEs early enough to hit the bug this commit fixes.

However, while the crash is fixed, the thresholding interrupts are still going to be coming in fast and furious, better to disable them on affected CPUs as fixed by the following:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=45d4b7b9cb88526f6d5bd4c03efab88d75d10e4f

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=71a84402b93e5fbd8f817f40059c137e10171788

If the above 2 commits are in place, 60c8144afc28 becomes less critical, as you should no longer hit that condition.