New HP EliteBook 745 G5, BIOS version 1.03.01. Ryzen PRO 2500u.
Booting any modern kernel (4.10+) hangs at boot on this system with no kernel messages displayed unless you disable MCE support (via mce=off).
Knowing Debian's 4.9 kernel boots fine, I bisected Linus's tree, and it appears this commit is the culprit:
18807ddb7f88d4ac3797302bafb18143d573e66f is the first bad commit
commit 18807ddb7f88d4ac3797302bafb18143d573e66f
Author: Yazen Ghannam <email address hidden>
Date: Tue Nov 15 15:13:53 2016 -0600
x86/mce/AMD: Reset Threshold Limit after logging error
The error count field in MCA_MISC does not get reset by hardware when the
threshold has been reached. Software is expected to reset it. Currently,
the threshold limit only gets reset during init or when a user writes to
sysfs.
If the user is not monitoring threshold interrupts and resetting
the limit then the user will only see 1 interrupt when the limit is first
hit. So if, for example, the limit is set to 10 then only 1 interrupt will
be recorded after 10 errors even if 100 errors have occurred. The user may
then assume that only 10 errors have occurred.
.. although the previous few commits to this one also are all related to MCE support on AMD systems, so it may be a culmination of a few commits.
Created attachment 278845
ACPI dump
New HP EliteBook 745 G5, BIOS version 1.03.01. Ryzen PRO 2500u.
Booting any modern kernel (4.10+) hangs at boot on this system with no kernel messages displayed unless you disable MCE support (via mce=off).
Knowing Debian's 4.9 kernel boots fine, I bisected Linus's tree, and it appears this commit is the culprit:
18807ddb7f8 8d4ac3797302baf b18143d573e66f is the first bad commit c3797302bafb181 43d573e66f
commit 18807ddb7f88d4a
Author: Yazen Ghannam <email address hidden>
Date: Tue Nov 15 15:13:53 2016 -0600
x86/mce/AMD: Reset Threshold Limit after logging error
The error count field in MCA_MISC does not get reset by hardware when the
threshold has been reached. Software is expected to reset it. Currently,
the threshold limit only gets reset during init or when a user writes to
sysfs.
If the user is not monitoring threshold interrupts and resetting
the limit then the user will only see 1 interrupt when the limit is first
hit. So if, for example, the limit is set to 10 then only 1 interrupt will
be recorded after 10 errors even if 100 errors have occurred. The user may
then assume that only 10 errors have occurred.
.. although the previous few commits to this one also are all related to MCE support on AMD systems, so it may be a culmination of a few commits.