Comment 0 for bug 1732990

Revision history for this message
Manoj Iyer (manjo) wrote :

[Impact]
Error records which have multiple errors in them will incorrectly report all errors after the first one. This results in garbage non-standard error trace events to be generated, and for AER and MC errors there will be no kernel action to help recover from these errors in the AER and EDAC drivers.

[Fix]
Patches in Linus tree fixes this issue:
aaf2c2fb0f51 ACPI / APEI: clear error status before acknowledging the error
c4335fdd3822 ACPI: APEI: fix the wrong iteration of generic error status block

[Testing]
Insert a e1000 pcie card into the system, run the following command that should generate PCIe correctable errors, you will see only the first error in each GHES report go to the AER driver rather than all errors from the GHES reports.

$ sudo setpci -s 0002:00:00.0 0x70c.l=0x00808000;sudo setpci -s 0002:00:00.0 CAP_EXP+0x10.B=0x4b;sleep 1;sudo setpci -s 0002:00:00.0 CAP_EXP+0x10.B=0x48

Where "0002:00:00.0" being the root hub for the card.

[Regression Potential]
The two patches to ACPI APEI driver was cleanly cherry picked from linus's tree and applied to Artful and Zesty. The patches were tested on QDF2400 platform where it was found to issue and don't introduce any regressions.