HSW131. Spurious Corrected Errors May be Reported
Problem: Due this erratum, spurious corrected errors may be logged in the IA32_MC0_STATUS
register with the valid field (bit 63) set, the uncorrected error field (bit 61) not set, a
Model Specific Error Code (bits [31:16]) of 0x000F, and an MCA Error Code (bits
[15:0]) of 0x0005. If CMCI is enabled, these spurious corrected errors also signal interrupts.
Implication: When this erratum occurs, software may see corrected errors that are benign. These
corrected errors may be safely ignored.
Workaround: None identified.
Status: For the steppings affected, see the Summary Table of Changes
I propose to work around this by mce=ignore_ce, as this is a spurious 'corrected error':
From Documentation/x86/x86_64/boot-options.txt:
mce=ignore_ce Disable features for corrected errors, e.g. polling timer
and CMCI. All events reported as corrected are not cleared
by OS and remained in its error banks. Usually this disablement is not recommended, however if there is an agent checking/clearing corrected errors (e.g. BIOS or hardware monitoring applications), conflicting with OS's error handling, and you cannot deactivate the agent, then this option will be a help.
I think this is related to the Haswell erratum 131 of the 'Intel® Xeon® Processor E3-1200 v3 Product Family Specification Update' at: www.intel. com/content/ dam/www/ public/ us/en/documents /specification- updates/ xeon-e3- 1200v3- spec-update. pdf
http://
HSW131. Spurious Corrected Errors May be Reported
Problem: Due this erratum, spurious corrected errors may be logged in the IA32_MC0_STATUS
register with the valid field (bit 63) set, the uncorrected error field (bit 61) not set, a
Model Specific Error Code (bits [31:16]) of 0x000F, and an MCA Error Code (bits
[15:0]) of 0x0005. If CMCI is enabled, these spurious corrected errors also signal interrupts.
Implication: When this erratum occurs, software may see corrected errors that are benign. These
corrected errors may be safely ignored.
Workaround: None identified.
Status: For the steppings affected, see the Summary Table of Changes
I propose to work around this by mce=ignore_ce, as this is a spurious 'corrected error': x86/x86_ 64/boot- options. txt:
Disable features for corrected errors, e.g. polling timer
Usually this disablement is not recommended, however if
there is an agent checking/clearing corrected errors
(e.g. BIOS or hardware monitoring applications), conflicting
with OS's error handling, and you cannot deactivate the agent,
then this option will be a help.
From Documentation/
mce=ignore_ce
and CMCI. All events reported as corrected are not cleared
by OS and remained in its error banks.
But I have not tried this yet.