Some PCIe errors not surfaced through rasdaemon
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
dann frazier | ||
Bionic |
Fix Released
|
Undecided
|
dann frazier |
Bug Description
[Impact]
The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe errors to the AER (Advanced Error Reporting) driver, which surfaces them to userspace. However, we're currently only reporting "recoverable" errors and not errors of other types (e.g. correctable), thus hiding signs of faulty hardware from the user.
[Test Case]
$ sudo apt install rasdaemon
# On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the attached script to inject a correctable PCIe error.
$ sudo ras-mc-ctl --errors
# There should be an entry for the injected error, as shown below:
No Memory errors.
PCIe AER events:
1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error
No Extlog errors.
No MCE errors.
[Fix]
There is a 2-patch upstream fix that addresses this issue and cleanly cherry-picks into Ubuntu. The solution is to not artficially limit which PCIe errors are reported down to the AER driver to those that are recoverable.
[Regression Risk]
Above test was ran on x86 & ARM platforms to mitigate regression risk.
Changed in linux (Ubuntu): | |
status: | New → In Progress |
Changed in linux (Ubuntu Bionic): | |
status: | New → In Progress |
assignee: | nobody → dann frazier (dannf) |
Changed in linux (Ubuntu): | |
assignee: | nobody → dann frazier (dannf) |
description: | updated |
Changed in linux (Ubuntu Bionic): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu): | |
status: | In Progress → Fix Released |
tags: | added: kernel-daily-bug |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification- needed- bionic' to 'verification- done-bionic' . If the problem still exists, change the tag 'verification- needed- bionic' to 'verification- failed- bionic' .
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/ /wiki.ubuntu. com/Testing/ EnableProposed for documentation how to enable and use -proposed. Thank you!